[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#395817: marked as done (iconv fails for non-ASCII characters in (seamengly all) ISO-8859-# charsets)



Your message dated Sat, 28 Oct 2006 11:57:43 +0200
with message-id <20061028095743.GA13903@henry.aurel32.net>
and subject line iconv fails for non-ASCII characters in (seamengly all) ISO-8859-# charsets
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---
Package: libc6
Version: 2.3.2.ds1-22
Severity: major

Iconv fails to convert ISO-8859-1 apostrophe to UTF-8.  It works OK
if input encoding is specified as cp1251.

ISO-8859-1 is perhaps the most widespread single-byte encoding; iconv
_must_ work with it.  Therefore I set severity to `major'.

To check:

$ iconv --from iso-8859-1 --to utf-8 iso-8859-1-test
$ iconv --from cp1251     --to utf-8 iso-8859-1-test

Or:

$ iconv --from iso-8859-1 --to utf-8 iso-8859-1-test | wc -c
$ iconv --from cp1251     --to utf-8 iso-8859-1-test | wc -c

(I get 81 and 82 correspondingly.  Apostrophe in ISO-8859-1 is not
touched and remains as invalid UTF-8 sequence.  In cp1251 it is
converted to a valid UTF-8 apostrophe, as expected.)

Apostrophe is considered an illegal character in ISO-8859-1.  The
behavior can also be seen in gedit and Emacs.  Other ISO-8859-#
charsets show the same behavior.

Kernel version: 2.6.8-3-k7

Test case is attached.

Because he?s already safe in the corner, Black can continue to build influence.

--- End Message ---
--- Begin Message ---
Hi,

On Sat, Oct 28, 2006 at 01:13:22AM +0300, Paul Pogonyshev wrote:
> Package: libc6
> Version: 2.3.2.ds1-22
> Severity: major
> 
> Iconv fails to convert ISO-8859-1 apostrophe to UTF-8.  It works OK
> if input encoding is specified as cp1251.
> 
> ISO-8859-1 is perhaps the most widespread single-byte encoding; iconv
> _must_ work with it.  Therefore I set severity to `major'.
> 
> To check:
> 
> $ iconv --from iso-8859-1 --to utf-8 iso-8859-1-test
> $ iconv --from cp1251     --to utf-8 iso-8859-1-test
> 
> Or:
> 
> $ iconv --from iso-8859-1 --to utf-8 iso-8859-1-test | wc -c
> $ iconv --from cp1251     --to utf-8 iso-8859-1-test | wc -c
> 
> (I get 81 and 82 correspondingly.  Apostrophe in ISO-8859-1 is not
> touched and remains as invalid UTF-8 sequence.  In cp1251 it is
> converted to a valid UTF-8 apostrophe, as expected.)
> 
> Apostrophe is considered an illegal character in ISO-8859-1.  The
> behavior can also be seen in gedit and Emacs.  Other ISO-8859-#
> charsets show the same behavior.
> 

This is a normal behaviour, as the char 0x92 is defined in the cp1251
charset, but not in the ISO-8859-1 charset, in which it is either a
control character or an illegal character depending on the
implementation.

Closing the bug.

Bye,
Aurelien

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian GNU/Linux developer | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

--- End Message ---

Reply to: