Bug#215247: marked as done (libc6: iconv seems not to handle utf-8 as specified in rfc2279)
Your message dated Sat, 11 Oct 2003 18:43:28 +0400
with message-id <200310111843.28363@sercond.localdomain>
and subject line Correction
has caused the attached Bug report to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere. Please contact me immediately.)
Debian bug tracking system administrator
(administrator, Debian Bugs database)
--------------------------------------
Received: (at submit) by bugs.debian.org; 11 Oct 2003 14:24:23 +0000
>From yoush@cs.msu.su Sat Oct 11 09:24:23 2003
Return-path: <yoush@cs.msu.su>
Received: from mail.dubki.ru [80.240.116.2]
by master.debian.org with esmtp (Exim 3.35 1 (Debian))
id 1A8KfO-0005A7-00; Sat, 11 Oct 2003 09:24:23 -0500
Received: by mail.dubki.ru (Postfix, from userid 1708)
id 6E2A9D; Sat, 11 Oct 2003 18:24:21 +0400 (MSD)
Received: from sercond (sercond.dubki.ru [172.16.4.21])
by mail.dubki.ru (Postfix) with ESMTP
id 12850C; Sat, 11 Oct 2003 18:24:19 +0400 (MSD)
Received: from nikita by sercond with local (Exim 3.35 #1 (Debian))
id 1A8Kcs-0003RW-00; Sat, 11 Oct 2003 18:21:46 +0400
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="KOI8-R"
From: "Nikita V. Youshchenko" <yoush@cs.msu.su>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: libc6: iconv seems not to handle utf-8 as specified in rfc2279
X-Mailer: reportbug 2.29
Date: Sat, 11 Oct 2003 18:21:46 +0400
Message-Id: <[🔎] E1A8Kcs-0003RW-00@sercond>
Delivered-To: submit@bugs.debian.org
X-Spam-Status: No, hits=-6.5 required=4.0
tests=BAYES_01,HAS_PACKAGE
version=2.53-bugs.debian.org_2003_10_09
X-Spam-Level:
X-Spam-Checker-Version: SpamAssassin 2.53-bugs.debian.org_2003_10_09 (1.174.2.15-2003-03-30-exp)
Package: libc6
Version: 2.3.2-8
Severity: normal
UTF-8 encoding is specified in RFC2279 as follows:
UCS-4 range (hex.) UTF-8 octet sequence (binary)
0000 0000-0000 007F 0xxxxxxx
0000 0080-0000 07FF 110xxxxx 10xxxxxx
0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
0400 0000-7FFF FFFF 1111110x 10xxxxxx ... 10xxxxxx
This means that ascii characters (hex 20 - 7F range) have multiple
representations. In fact, is a well-known issue in security analysis.
E.g. '.' character has the following representations:
2E
C0 AE
E0 80 AE
F0 80 80 AE
F8 80 80 80 AE
FC 80 80 80 80 AE.
However, iconv can handle only the first of these representations:
nikita@bliss:~> printf '\x2E\n' | iconv -f utf-8 -t us-ascii
.
nikita@bliss:~> printf '\xC0\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xE0\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xF0\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xF8\x80\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xFC\x80\x80\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
-- System Information:
Debian Release: 3.0
Architecture: i386
Kernel: Linux sercond 2.4.21 #1 óÒÄ éÀÌ 30 22:24:06 MSD 2003 i686
Locale: LANG=ru_RU.KOI8-R, LC_CTYPE=ru_RU.KOI8-R
Versions of packages libc6 depends on:
ii libdb1-compat 2.1.3-7 The Berkeley database routines [gl
-- no debconf information
---------------------------------------
Received: (at 215247-done) by bugs.debian.org; 11 Oct 2003 14:43:26 +0000
>From yoush@cs.msu.su Sat Oct 11 09:43:25 2003
Return-path: <yoush@cs.msu.su>
Received: from mail.dubki.ru [80.240.116.2]
by master.debian.org with esmtp (Exim 3.35 1 (Debian))
id 1A8Kxp-0006uO-00; Sat, 11 Oct 2003 09:43:25 -0500
Received: by mail.dubki.ru (Postfix, from userid 1708)
id 289FA247; Sat, 11 Oct 2003 18:43:21 +0400 (MSD)
Received: from sercond (sercond.dubki.ru [172.16.4.21])
by mail.dubki.ru (Postfix) with ESMTP
id AC9021C; Sat, 11 Oct 2003 18:43:18 +0400 (MSD)
Received: from localhost ([127.0.0.1])
by sercond with esmtp (Exim 3.35 #1 (Debian))
id 1A8Kxs-0003WZ-00; Sat, 11 Oct 2003 18:43:28 +0400
From: "Nikita V. Youshchenko" <yoush@cs.msu.su>
To: 215247-done@bugs.debian.org
Subject: Correction
Date: Sat, 11 Oct 2003 18:43:28 +0400
User-Agent: KMail/1.5.3
Cc: yoush@cs.msu.su
MIME-Version: 1.0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200310111843.28363@sercond.localdomain>
Delivered-To: 215247-done@bugs.debian.org
X-Spam-Status: No, hits=-2.0 required=4.0
tests=BAYES_00
version=2.53-bugs.debian.org_2003_10_09
X-Spam-Level:
X-Spam-Checker-Version: SpamAssassin 2.53-bugs.debian.org_2003_10_09 (1.174.2.15-2003-03-30-exp)
Sorry, there was a mistake in my previous message: RFC cited does not
promote duplicates. Even more - it explicitly warns that illegal
sequences should not be used. I was fooled by some papers on web
security.
Reply to: