[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#215247: marked as done (libc6: iconv seems not to handle utf-8 as specified in rfc2279)



Your message dated Sat, 11 Oct 2003 18:43:28 +0400
with message-id <200310111843.28363@sercond.localdomain>
and subject line Correction
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--------------------------------------
Received: (at submit) by bugs.debian.org; 11 Oct 2003 14:24:23 +0000
>From yoush@cs.msu.su Sat Oct 11 09:24:23 2003
Return-path: <yoush@cs.msu.su>
Received: from mail.dubki.ru [80.240.116.2] 
	by master.debian.org with esmtp (Exim 3.35 1 (Debian))
	id 1A8KfO-0005A7-00; Sat, 11 Oct 2003 09:24:23 -0500
Received: by mail.dubki.ru (Postfix, from userid 1708)
	id 6E2A9D; Sat, 11 Oct 2003 18:24:21 +0400 (MSD)
Received: from sercond (sercond.dubki.ru [172.16.4.21])
	by mail.dubki.ru (Postfix) with ESMTP
	id 12850C; Sat, 11 Oct 2003 18:24:19 +0400 (MSD)
Received: from nikita by sercond with local (Exim 3.35 #1 (Debian))
	id 1A8Kcs-0003RW-00; Sat, 11 Oct 2003 18:21:46 +0400
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="KOI8-R"
From: "Nikita V. Youshchenko" <yoush@cs.msu.su>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: libc6: iconv seems not to handle utf-8 as specified in rfc2279
X-Mailer: reportbug 2.29
Date: Sat, 11 Oct 2003 18:21:46 +0400
Message-Id: <[🔎] E1A8Kcs-0003RW-00@sercond>
Delivered-To: submit@bugs.debian.org
X-Spam-Status: No, hits=-6.5 required=4.0
	tests=BAYES_01,HAS_PACKAGE
	version=2.53-bugs.debian.org_2003_10_09
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.53-bugs.debian.org_2003_10_09 (1.174.2.15-2003-03-30-exp)

Package: libc6
Version: 2.3.2-8
Severity: normal

UTF-8 encoding is specified in RFC2279 as follows:

   UCS-4 range (hex.)           UTF-8 octet sequence (binary)
   0000 0000-0000 007F   0xxxxxxx
   0000 0080-0000 07FF   110xxxxx 10xxxxxx
   0000 0800-0000 FFFF   1110xxxx 10xxxxxx 10xxxxxx

   0001 0000-001F FFFF   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
   0020 0000-03FF FFFF   111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
   0400 0000-7FFF FFFF   1111110x 10xxxxxx ... 10xxxxxx

This means that ascii characters (hex 20 - 7F range) have multiple
representations. In fact, is a well-known issue in security analysis.

E.g. '.' character has the following representations:

2E
C0 AE
E0 80 AE
F0 80 80 AE
F8 80 80 80 AE
FC 80 80 80 80 AE.

However, iconv can handle only the first of these representations:

nikita@bliss:~> printf '\x2E\n' | iconv -f utf-8 -t us-ascii
.
nikita@bliss:~> printf '\xC0\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xE0\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xF0\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xF8\x80\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0
nikita@bliss:~> printf '\xFC\x80\x80\x80\x80\xAE\n' | iconv -f utf-8 -t us-ascii
iconv: illegal input sequence at position 0


-- System Information:
Debian Release: 3.0
Architecture: i386
Kernel: Linux sercond 2.4.21 #1 óÒÄ éÀÌ 30 22:24:06 MSD 2003 i686
Locale: LANG=ru_RU.KOI8-R, LC_CTYPE=ru_RU.KOI8-R

Versions of packages libc6 depends on:
ii  libdb1-compat                 2.1.3-7    The Berkeley database routines [gl

-- no debconf information


---------------------------------------
Received: (at 215247-done) by bugs.debian.org; 11 Oct 2003 14:43:26 +0000
>From yoush@cs.msu.su Sat Oct 11 09:43:25 2003
Return-path: <yoush@cs.msu.su>
Received: from mail.dubki.ru [80.240.116.2] 
	by master.debian.org with esmtp (Exim 3.35 1 (Debian))
	id 1A8Kxp-0006uO-00; Sat, 11 Oct 2003 09:43:25 -0500
Received: by mail.dubki.ru (Postfix, from userid 1708)
	id 289FA247; Sat, 11 Oct 2003 18:43:21 +0400 (MSD)
Received: from sercond (sercond.dubki.ru [172.16.4.21])
	by mail.dubki.ru (Postfix) with ESMTP
	id AC9021C; Sat, 11 Oct 2003 18:43:18 +0400 (MSD)
Received: from localhost ([127.0.0.1])
	by sercond with esmtp (Exim 3.35 #1 (Debian))
	id 1A8Kxs-0003WZ-00; Sat, 11 Oct 2003 18:43:28 +0400
From: "Nikita V. Youshchenko" <yoush@cs.msu.su>
To: 215247-done@bugs.debian.org
Subject: Correction
Date: Sat, 11 Oct 2003 18:43:28 +0400
User-Agent: KMail/1.5.3
Cc: yoush@cs.msu.su
MIME-Version: 1.0
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200310111843.28363@sercond.localdomain>
Delivered-To: 215247-done@bugs.debian.org
X-Spam-Status: No, hits=-2.0 required=4.0
	tests=BAYES_00
	version=2.53-bugs.debian.org_2003_10_09
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.53-bugs.debian.org_2003_10_09 (1.174.2.15-2003-03-30-exp)

Sorry, there was a mistake in my previous message: RFC cited does not 
promote duplicates. Even more - it explicitly warns that illegal 
sequences should not be used. I was fooled by some papers on web 
security.



Reply to: