[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#372515: marked as done (iconv(): Returns EILSEQ when it can't convert to the output encoding.)



Your message dated Sun, 15 Apr 2007 01:25:44 +0200
with message-id <20070414232544.GA19608@artemis>
and subject line Bug#372515: iconv(): Returns EILSEQ when it can't convert to the output encoding.
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---
Package: libc6
Version: 2.3.6-15
Severity: important

Hi,

It seems that iconv() return -1 and sets errno to EILSEQ on valid
input that it can't convert to the output encoding.  It shouldn't be
doing that, since it is valid input.

This can be simple showed using the iconv util, since it reacts
the same.  An simple latin1 file:
$ cat test.txt
tést
$ iconv -f latin1 -t ASCII test.txt > /dev/null
iconv: illegal input sequence at position 1
$ iconv -f latin1 -t UTF-8 test.txt > /dev/null
$ 

>From the manpage:
       EILSEQ An invalid multibyte sequence has been encountered in the input.

>From Single Unix Specification 3:
   [EILSEQ]
          Input conversion stopped due to an input byte that does not
          belong to the input codeset.

It also says:
     If iconv() encounters a character in the input buffer that is
     valid, but for which an identical character does not exist in the
     target codeset, iconv() shall perform an implementation-defined
     conversion on this character.

Instead of doing an "implementation-defined conversion", it's
returning an error, and saying the input is invalid, while the
input is clearly valid.  I would rather have that it actually
follows the standard, and does some conversion, even if it just
turns it in a '?' or something.


Kurt



--- End Message ---
--- Begin Message ---
On Fri, Jun 09, 2006 at 10:12:26PM +0200, Kurt Roeckx wrote:
> Package: libc6
> Version: 2.3.6-15
> Severity: important
> 
> Hi,
> 
> It seems that iconv() return -1 and sets errno to EILSEQ on valid
> input that it can't convert to the output encoding.  It shouldn't be
> doing that, since it is valid input.
> 
> This can be simple showed using the iconv util, since it reacts
> the same.  An simple latin1 file:
> $ cat test.txt
> tést
> $ iconv -f latin1 -t ASCII test.txt > /dev/null
> iconv: illegal input sequence at position 1
> $ iconv -f latin1 -t UTF-8 test.txt > /dev/null
> $ 
> 
> >From the manpage:
>        EILSEQ An invalid multibyte sequence has been encountered in the input.
> 
> >From Single Unix Specification 3:
>    [EILSEQ]
>           Input conversion stopped due to an input byte that does not
>           belong to the input codeset.
> 
> It also says:
>      If iconv() encounters a character in the input buffer that is
>      valid, but for which an identical character does not exist in the
>      target codeset, iconv() shall perform an implementation-defined
>      conversion on this character.
> 
> Instead of doing an "implementation-defined conversion", it's
> returning an error, and saying the input is invalid, while the
> input is clearly valid.  I would rather have that it actually
> follows the standard, and does some conversion, even if it just
> turns it in a '?' or something.

  By default, iconv is strict, and won't silentely replace bad chars. If
you want it to perform approximations, you can as for ascii//TRANSLIT
rather than ascii, or ascii//IGNORE to ignore untranslateable
characters.

  If you want a less strict tool, recode is what you seek. iconv is
meant to be a very strict one, to check the validity of an encoded
string e.g. (inconv -f utf8 -t utf8 < foo > /dev/null is a trick to
verify a text is valid utf8 e.g.).

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

Attachment: pgpOtlY70iybz.pgp
Description: PGP signature


--- End Message ---

Reply to: