--- Begin Message ---
- To: 344146-done@bugs.debian.org
- Subject: Re: regex_internal.c bug (Re: Bug#344146: towupper(3) bug? (Re: re_search(3) dumps core))
- From: Aurelien Jarno <aurelien@aurel32.net>
- Date: Sun, 22 Apr 2007 23:41:23 +0200
- Message-id: <20070422214122.GA9361@i386.aurel32.net>
- In-reply-to: <87wthsdjvp.wl%ukai@debian.or.jp>
- References: <87ek447wk0.wl%ukai@debian.or.jp> <87bqz77pos.wl%ukai@debian.or.jp> <87wthsdjvp.wl%ukai@debian.or.jp>
Version: 2.5-1
On Mon, Dec 26, 2005 at 11:15:54PM +0900, Fumitoshi UKAI wrote:
> At Sat, 24 Dec 2005 01:17:55 +0900,
> Fumitoshi UKAI wrote:
>
> > > It is a bug in libc6, not in grep.
> > > grep 2.3.1.ds2-4 works fine on libc6 2.3.2.ds1-22 if I rebuilt on sarge.
> >
> > > It seems some problem in posix/regex_internal.c:build_wcs_upper_buffer().
> > >
> > > % LANG=ja_JP.EUC-JP gdb ./a.out
> > > GNU gdb 6.4-debian
> > > Copyright 2005 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you are
> > > welcome to change it and/or distribute copies of it under certain conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB. Type "show warranty" for details.
> > > This GDB was configured as "i486-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1".
> > >
> > > (gdb) run
> > > Starting program: /tmp/a.out
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0xb7f1920f in memcpy () from /lib/tls/libc.so.6
> > > (gdb) bt
> > > #0 0xb7f1920f in memcpy () from /lib/tls/libc.so.6
> > > #1 0xb7f4a07a in build_wcs_upper_buffer () from /lib/tls/libc.so.6
> > > #2 0xb7f4a335 in re_string_reconstruct () from /lib/tls/libc.so.6
> > > #3 0xb7f5bde7 in re_search_internal () from /lib/tls/libc.so.6
> > > #4 0xb7f5ea89 in re_search_stub () from /lib/tls/libc.so.6
> > > #5 0xb7f5ef63 in re_search () from /lib/tls/libc.so.6
> > > #6 0x08048618 in main (argc=1, argv=0xbffffaf4) at rtest.c:28
> > > (gdb)
> >
> > I investigated this more on this:
> >
> > * input multi byte sequence is "\x8f\xa9\xc3", which is
> > LATIN SMALL LETTER ETH in EUC-JP encoding.
> >
> > * if RE_ICASE is used in re_syntax, re_search tries to convert
> > characters to be upper case by build_wcs_upper_buffer().
> >
> > * when multibyte sequence "\x8f\xa9\xc3" in EUC-JP is converted to
> > wide character, we'll get 0x00F0 (LATAIN SMALL LETTER ETH; U00F0).
> >
> > * This wide character (LATIN SMALL LETTER ETH; U00F0) is lower case,
> > so we need to towupper() this.
> >
> > * when towupper() this wide character (LATIN SMALL LETTER ETH; U00F0),
> > we'll get wide character 0x00D0 (LATIN CAPITAL LETTER ETH; U00D0).
> >
> > * when wide character 0x00D0 (LATIN CAPITAL LETTER ETH; U00D0) back to
> > multibyte sequence in EUC-JP, it fails, so wcrtomb() returns (size_t)(-1).
> > (there are no valid byte sequence to represent LATIN CAPITAL LETTER ETH;
> > U00D0 in EUC-JP encoding).
> >
> > * however, build_wcs_upper_buffer() doesn't care this case.
> > it assumes mbrtowc -> towupper -> wcrtomb always success and only care
> > the case that lengths of multibyte sequences would be different.
>
> It seems this bug has been fixed on posix/regex_internal.c 1.52 (and 1.41.2.7)
>
> http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/posix/regex_internal.c.diff?r1=1.51&r2=1.52&cvsroot=glibc
>
Yep, I am able to reproduce the bug on glibc 2.3.6 but not on glibc 2.5.
I am therefore closing the bug.
--
.''`. Aurelien Jarno | GPG: 1024D/F1BCDB73
: :' : Debian developer | Electrical Engineer
`. `' aurel32@debian.org | aurelien@aurel32.net
`- people.debian.org/~aurel32 | www.aurel32.net
--- End Message ---