[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#152620: marked as done (sscanf with %l[...] format is completely broken)



Your message dated Mon, 11 Oct 2004 08:55:25 +0900
with message-id <81ekk6uo0i.wl@omega.webmasters.gr.jp>
and subject line sscanf with %l[...] format is completely broken
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--------------------------------------
Received: (at submit) by bugs.debian.org; 11 Jul 2002 13:46:32 +0000
>From simon@ixion.tartarus.org Thu Jul 11 08:46:32 2002
Return-path: <simon@ixion.tartarus.org>
Received: from ixion.tartarus.org [195.149.39.210] 
	by master.debian.org with esmtp (Exim 3.12 1 (Debian))
	id 17SeHA-0004dx-00; Thu, 11 Jul 2002 08:46:32 -0500
Received: from simon by ixion.tartarus.org with local (Exim 3.12 #1 (Debian))
	id 17SeH9-00071T-00; Thu, 11 Jul 2002 14:46:31 +0100
X-Mailer: Jed/Timber v0.2
From: Simon Tatham <anakin@pobox.com>
To: submit@bugs.debian.org
Subject: sscanf with %l[...] format is completely broken
Message-Id: <E17SeH9-00071T-00@ixion.tartarus.org>
Sender: Simon Tatham <simon@ixion.tartarus.org>
Date: Thu, 11 Jul 2002 14:46:31 +0100
Delivered-To: submit@bugs.debian.org

Package: libc6
Version: 2.2.5-6

Using sscanf with the %l[...] format (to scan a multi-byte string
while accepting only a particular range of bytes), glibc produces an
assertion failure. Here's my sample test case: note that it requires
a UTF-8 locale to be installed on the system. (I'd imagine it would
fail the same way in other UTF-8 locales - I can't believe the en_GB
bit is vital!)

#include <locale.h>
#include <wchar.h>

int main(void)
{
    int ret;
    wchar_t str2[10];
    char utf8[] = "\xE2\xA5\xA1\xE3\xA5\xA2\xE4\xA5\xA3\xE5\xA5\xA4";

    setlocale(LC_ALL, "en_GB.UTF-8");
    ret = sscanf(utf8, "%l[\xA1-\xA5\xE2-\xE5]", str2);
    printf("simple scan yielded %d, \"%ls\"\n", ret, str2);

    return 0;
}

That UTF-8 string contains four Unicode characters (U+2961, U+3962,
U+4963, U+5964) and all the bytes used are within the scan set, so I
think that sscanf should return 1 and store a four-character Unicode
string in str2. However, when I actually try it:

: judicator; gcc -o mbtest mbtest.c && ./mbtest
mbtest: vfscanf.c:2102: _IO_vfscanf: Assertion `cnt <
(__ctype_get_mb_cur_max ())' failed.
Aborted

Investigation of the glibc source (as obtained from the public CVS
repository at :pserver:anoncvs@anoncvs.cygnus.com:/cvs/glibc)
suggests that the problem is in vfscanf.c, where the counter `cnt'
is initialised to zero at the start of the scan and incremented
every time a byte is added to the current multibyte character. The
routine assertion that cnt should always be less than the maximum
size of an MB-char fails because cnt is never reset to zero when a
character is output!

(I'm pretty sure of this diagnosis because I've not only checked the
source but debugged through the Debian libc binary at the assembly
level.)

The following patch (untested but seems trivial to me) seems likely
to solve the problem. It's a patch against rev 1.100 of
stdio-common/vfscanf.c from the CVS repository mentioned above.

Index: vfscanf.c
===================================================================
RCS file: /cvs/glibc/libc/stdio-common/vfscanf.c,v
retrieving revision 1.100
diff -u -r1.100 vfscanf.c
--- vfscanf.c	11 Jul 2002 08:32:19 -0000	1.100
+++ vfscanf.c	11 Jul 2002 13:41:56 -0000
@@ -2107,6 +2107,7 @@
 			  continue;
 			}
 
+                      cnt = 0;
 		      ++wstr;
 		      if ((flags & MALLOC)
 			  && wstr == (wchar_t *) *strptr + strsize)

Cheers,
Simon
-- 
Simon Tatham         "Selfless? I'm so selfless I
<anakin@pobox.com>    don't even know who I am."

---------------------------------------
Received: (at 152620-done) by bugs.debian.org; 10 Oct 2004 23:55:26 +0000
>From gotom@debian.or.jp Sun Oct 10 16:55:26 2004
Return-path: <gotom@debian.or.jp>
Received: from omega.webmasters.gr.jp (webmasters.gr.jp) [218.44.239.78] 
	by spohr.debian.org with esmtp (Exim 3.35 1 (Debian))
	id 1CGnXC-00062P-00; Sun, 10 Oct 2004 16:55:26 -0700
Received: from omega.webmasters.gr.jp (localhost [127.0.0.1])
	by webmasters.gr.jp (Postfix) with ESMTP
	id 9DC82DEB5D; Mon, 11 Oct 2004 08:55:25 +0900 (JST)
Date: Mon, 11 Oct 2004 08:55:25 +0900
Message-ID: <81ekk6uo0i.wl@omega.webmasters.gr.jp>
From: GOTO Masanori <gotom@debian.or.jp>
To: Simon Tatham <anakin@pobox.com>
Cc: 152620-done@bugs.debian.org
Subject: Re: sscanf with %l[...] format is completely broken
User-Agent: Wanderlust/2.9.9 (Unchained Melody) SEMI/1.14.3 (Ushinoya)
 FLIM/1.14.3 (=?ISO-8859-4?Q?Unebigory=F2mae?=) APEL/10.3 Emacs/21.2
 (i386-debian-linux-gnu) MULE/5.0 (SAKAKI)
MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
Delivered-To: 152620-done@bugs.debian.org
X-Spam-Checker-Version: SpamAssassin 2.60-bugs.debian.org_2004_03_25 
	(1.212-2003-09-23-exp) on spohr.debian.org
X-Spam-Status: No, hits=-3.0 required=4.0 tests=BAYES_00 autolearn=no 
	version=2.60-bugs.debian.org_2004_03_25
X-Spam-Level: 

> Using sscanf with the %l[...] format (to scan a multi-byte string
> while accepting only a particular range of bytes), glibc produces an
> assertion failure. Here's my sample test case: note that it requires
> a UTF-8 locale to be installed on the system. (I'd imagine it would
> fail the same way in other UTF-8 locales - I can't believe the en_GB
> bit is vital!)

In 2.3.2.ds1-17, this bug is already fixed, so I close this report.
Thanks for your report!

Regards,
-- gotom



Reply to: