Bug#525299: iconv accepts UTF-8-encoded UTF-16 surrogates
Package: libc6
Version: 2.9-7
Severity: normal
$ man utf-8 | grep -A 2 UTF-16 | sed -e 's/^ *//'
The UCS code values 0xd800–0xdfff (UTF-16 surrogates) as well as 0xfffe
and 0xffff (UCS non-characters) should not appear in conforming UTF-8
streams.
$ s='\xed\xa0\x88\xed\xbd\x85' # 0xd808 + 0xdf45
$ for e in UTF-8 UTF-16 UTF-32 UCS-4
do
printf "$e\t"
printf $s | iconv -f UTF-8 -t $e > /dev/null && printf 'OK\n'
done
UTF-8 OK
UTF-16 iconv: illegal input sequence at position 0
UTF-32 iconv: illegal input sequence at position 0
UCS-4 OK
-- System Information:
Debian Release: squeeze/sid
APT prefers unstable
APT policy: (900, 'unstable'), (500, 'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.6.26-1-686 (SMP w/2 CPU cores)
Locale: LANG=C, LC_CTYPE=pl_PL.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages libc6 depends on:
ii libgcc1 1:4.3.3-8 GCC support library
libc6 recommends no packages.
Versions of packages libc6 suggests:
ii glibc-doc 2.9-7 GNU C Library: Documentation
ii libc6-i686 2.9-7 GNU C Library: Shared libraries [i
ii locales 2.9-7 GNU C Library: National Language (
-- debconf information:
glibc/upgrade: true
glibc/disable-screensaver:
glibc/restart-failed:
* glibc/restart-services:
--
Jakub Wilk
Reply to: