[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#316147: marked as done (iconv: options for illegal characters)



Your message dated Sun, 15 Apr 2007 02:48:31 +0200
with message-id <20070415004831.GA22150@artemis>
and subject line Bug#316147: iconv: options for illegal characters
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---
Package: libc6
Version: 2.3.2.ds1-22
Severity: wishlist
File: /usr/bin/iconv
Tags: upstream

-c is nice, but it would be nice to know just how many illegal
characters were invalid characters were omitted from the output.
--verbose won't say, but should.

$ iconv -f gb2312 -t big5 gdxw08.htm | wc -c
iconv: illegal input sequence at position 906
906
$  iconv -f gb2312 -t big5 -c gdxw08.htm | wc -c - gdxw08.htm
   4585 -
   4585 gdxw08.htm
   9170 total

The man page said "Omit invalid characters from output", well maybe it
should say more, like "just send the character it can't deal with
through to the output unconverted".

Or better yet, give the user the choice of deleting them, sending them
through, or redirecting them, etc.

Greater still would be an option to "mark unconvertible characters
with @--> <--@ [or customizable]"


--- End Message ---
--- Begin Message ---
On Wed, Jun 29, 2005 at 01:53:33AM +0800, Dan Jacobson wrote:
> Package: libc6
> Version: 2.3.2.ds1-22
> Severity: wishlist
> File: /usr/bin/iconv
> Tags: upstream
> 
> -c is nice, but it would be nice to know just how many illegal
> characters were invalid characters were omitted from the output.
> --verbose won't say, but should.
> 
> $ iconv -f gb2312 -t big5 gdxw08.htm | wc -c
> iconv: illegal input sequence at position 906
> 906
> $  iconv -f gb2312 -t big5 -c gdxw08.htm | wc -c - gdxw08.htm
>    4585 -
>    4585 gdxw08.htm
>    9170 total

  iconv is meant to be strict. If you want it to omit errors, then use
//IGNORE after your encoding name, or //TRANSLIT to try some proximity
transliterations.

  If you want more subtle ways, recode is the tool you want.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

Attachment: pgp4LTsSCJNqB.pgp
Description: PGP signature


--- End Message ---

Reply to: