[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#177646: libc6: iconv '-c' option, still buggy with //TRANSLIT



> It is easy to see that only the first invalid character is omitted. 

No. The situation is a bit different. There are three kinds of errors that
can occur when doing a character conversion:

a) the input sequence is valid so far, but incomplete
b) the input sequence is invalid
c) the input sequence is valid, but the character it represents is
   unrepresentable in the target encoding

To distinguish these three failure conditions, iconv() may return one of two
error values, namely EINVAL for a) and EILSEQ for b). It does return EILSEQ
for c) as well, although it's man page iconv(3) ignores this case.

Now it might be just the problem that the -c option only hides one of the
two cases b) or c). However, I cannot reproduce this problem on the current
glibc; however, I do not have your file to check. I _can_, however, reproduce
this problem when the //TRANSLIT option is used, because //TRANSLIT and
//IGNORE don't match - the -c will just suppress the error message, but transliteration
will have stopped at the first invalid sequence.

$ echo -e "a\xc3\xb6a\xf6a\xc3\xb6" | LC_ALL=de_DE iconv -f UTF-8 -t US-ASCII//TRANSLIT
aoeaiconv: ungültige Eingabe-Sequenz an der Stelle 4
$

(each sequence is an ö in either UTF-8 or ISO-8859-1)



Reply to: