[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#177646: libc6: iconv '-c' option, still buggy with //TRANSLIT



At Tue, 2 Sep 2003 02:29:24 +0200,
R� Kuhlmann wrote:
> > It is easy to see that only the first invalid character is omitted. 
> 
> No. The situation is a bit different. There are three kinds of errors that
> can occur when doing a character conversion:
> 
> a) the input sequence is valid so far, but incomplete
> b) the input sequence is invalid
> c) the input sequence is valid, but the character it represents is
>    unrepresentable in the target encoding
> 
> To distinguish these three failure conditions, iconv() may return one of two
> error values, namely EINVAL for a) and EILSEQ for b). It does return EILSEQ
> for c) as well, although it's man page iconv(3) ignores this case.
> 
> Now it might be just the problem that the -c option only hides one of the
> two cases b) or c). However, I cannot reproduce this problem on the current
> glibc;

Exactly.  SUSv3 said:

   -c  Omit any invalid characters from the output. When -c is not
       used, the results of encountering invalid characters in the
       input stream (either those that are not valid members of the
       fromcode or those that have no corresponding value in tocode)
       shall be specified in the system documentation. The presence or
       absence of -c shall not affect the exit status of iconv.

Interpret this: if we encounter (a)(b), its behavior is undefined.  If
we encounter (c), its behavior is omission.  Nowadays glibc works fine.

> however, I do not have your file to check. I _can_, however, reproduce
> this problem when the //TRANSLIT option is used, because //TRANSLIT and
> //IGNORE don't match - the -c will just suppress the error message, but transliteration
> will have stopped at the first invalid sequence.
> 
> $ echo -e "a\xc3\xb6a\xf6a\xc3\xb6" | LC_ALL=de_DE iconv -f UTF-8 -t US-ASCII//TRANSLIT
> aoeaiconv: ungültige Eingabe-Sequenz an der Stelle 4
> $
> 
> (each sequence is an ö in either UTF-8 or ISO-8859-1)

No, your example did not use "-c" option.

So it's not bug.  Could I close this bug?

Regards,
-- gotom



Reply to: