[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#344146: regex_internal.c bug (Re: Bug#344146: towupper(3) bug? (Re: re_search(3) dumps core))



At Sat, 24 Dec 2005 01:17:55 +0900,
Fumitoshi UKAI wrote:

> > It is a bug in libc6, not in grep.
> > grep 2.3.1.ds2-4 works fine on libc6 2.3.2.ds1-22 if I rebuilt on sarge. 
> 
> > It seems some problem in posix/regex_internal.c:build_wcs_upper_buffer().
> > 
> > % LANG=ja_JP.EUC-JP gdb ./a.out
> > GNU gdb 6.4-debian
> > Copyright 2005 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > This GDB was configured as "i486-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1".
> > 
> > (gdb) run
> > Starting program: /tmp/a.out
> > 
> > Program received signal SIGSEGV, Segmentation fault.
> > 0xb7f1920f in memcpy () from /lib/tls/libc.so.6
> > (gdb) bt
> > #0  0xb7f1920f in memcpy () from /lib/tls/libc.so.6
> > #1  0xb7f4a07a in build_wcs_upper_buffer () from /lib/tls/libc.so.6
> > #2  0xb7f4a335 in re_string_reconstruct () from /lib/tls/libc.so.6
> > #3  0xb7f5bde7 in re_search_internal () from /lib/tls/libc.so.6
> > #4  0xb7f5ea89 in re_search_stub () from /lib/tls/libc.so.6
> > #5  0xb7f5ef63 in re_search () from /lib/tls/libc.so.6
> > #6  0x08048618 in main (argc=1, argv=0xbffffaf4) at rtest.c:28
> > (gdb)
> 
> I investigated this more on this:
> 
>  * input multi byte sequence is "\x8f\xa9\xc3", which is
>    LATIN SMALL LETTER ETH in EUC-JP encoding.
> 
>  * if RE_ICASE is used in re_syntax, re_search tries to convert
>    characters to be upper case by build_wcs_upper_buffer().
> 
>  * when multibyte sequence "\x8f\xa9\xc3" in EUC-JP is converted to 
>    wide character, we'll get 0x00F0 (LATAIN SMALL LETTER ETH; U00F0).
> 
>  * This wide character (LATIN SMALL LETTER ETH; U00F0) is lower case,
>    so we need to towupper() this.
> 
>  * when towupper() this wide character (LATIN SMALL LETTER ETH; U00F0), 
>    we'll get wide character 0x00D0 (LATIN CAPITAL LETTER ETH; U00D0).
> 
>  * when wide character 0x00D0 (LATIN CAPITAL LETTER ETH; U00D0) back to
>    multibyte sequence in EUC-JP, it fails, so wcrtomb() returns (size_t)(-1).
>    (there are no valid byte sequence to represent LATIN CAPITAL LETTER ETH;
>    U00D0 in EUC-JP encoding).
> 
>  * however, build_wcs_upper_buffer() doesn't care this case.
>    it assumes mbrtowc -> towupper -> wcrtomb always success and only care
>    the case that lengths of multibyte sequences would be different.

It seems this bug has been fixed on posix/regex_internal.c 1.52 (and 1.41.2.7)

http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/posix/regex_internal.c.diff?r1=1.51&r2=1.52&cvsroot=glibc

Regards,
Fumitoshi UKAI



Reply to: