[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#215247: libc6: iconv seems not to handle utf-8 as specified in rfc2279



Package: libc6
Version: 2.3.2-8
Severity: normal

UTF-8 encoding is specified in RFC2279 as follows:

   UCS-4 range (hex.)           UTF-8 octet sequence (binary)
   0000 0000-0000 007F   0xxxxxxx
   0000 0080-0000 07FF   110xxxxx 10xxxxxx
   0000 0800-0000 FFFF   1110xxxx 10xxxxxx 10xxxxxx

   0001 0000-001F FFFF   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
   0020 0000-03FF FFFF   111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
   0400 0000-7FFF FFFF   1111110x 10xxxxxx ... 10xxxxxx

This means that ascii characters (hex 20 - 7F range) have multiple
representations. In fact, is a well-known issue in security analysis.

E.g. '.' character has the following representations:

2E
C0 AE
E0 80 AE
F0 80 80 AE
F8 80 80 80 AE
FC 80 80 80 80 AE.

However, iconv can handle only the first of these representations:

nikita@bliss:~> printf '\x2E\n' | iconv -f utf-8 -t us-ascii
.
nikita@bliss:~> printf '\xC0\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xE0\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xF0\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xF8\x80\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xFC\x80\x80\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0


-- System Information:
Debian Release: 3.0
Architecture: i386
Kernel: Linux sercond 2.4.21 #1 Срд Июл 30 22:24:06 MSD 2003 i686
Locale: LANG=ru_RU.KOI8-R, LC_CTYPE=ru_RU.KOI8-R

Versions of packages libc6 depends on:
ii  libdb1-compat                 2.1.3-7    The Berkeley database routines [gl

-- no debconf information




Reply to: