[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#187991: libc6: regex (re_exec) segfault in UTF-8 locale [Re: grep 2.5.1 segfault, and (more) color patch (again)



Package: libc6
Version: 2.3.1-16
Severity: normal

Eric Agnew <agnew@geekhive.net> wrote:
> First, a bug report: I'm getting a segfault on grep 2.5.1 when grepping
> the edict file ( http://ftp.cc.monash.edu.au/pub/nihongo/edict.gz ):
>
>         egrep '^(.)(.)(.)\1\2\3 ' edict
>   or:
>         grep '^\(.\)\(.\)\(.\)\1\2\3 ' edict
>
> both output 13 lines and the seg fault.  strace didn't seem to tell me
> anything, and I've never been able to figure out gdb, so.. hopefully
> someone will be able to reproduce it..  For reference, I'm running
> Linux (debian/unstable) on x86.

Thanks for the report.
Note that to reproduce the failure you probably have to
be using a UTF-8 locale.  The system I used happened to have
fr_FR.UTF-8 installed, so I used that, even though the data
in that file is in Japanese.

On a system with x86 Linux debian/unstable (grep-2.5.1-4
and libc6-2.3.1-16), I pared it down to this:

  $ printf pMik3KTIpNwK | recode /64 \
    | LC_ALL=fr_FR.UTF-8 /bin/grep -nE '^(.)(.)(.)\1\2\3 '
  Segmentation fault
  [Exit 139 (SIGSEGV)]

This also does it:

  $ grep totteringly edict|LC_ALL=fr_FR.UTF-8 /bin/grep -nE '^(.)(.)(.)\1\2\3 '
  Segmentation fault
  [Exit 139 (SIGSEGV)]

It looks like a problem in libc's re_exec function:

  $ LC_ALL=fr_FR.UTF-8 gdb /bin/grep
  (gdb) r -E '^(.)(.)(.)\1\2\3 ' k
  Starting program: /bin/grep -E '^(.)(.)(.)\1\2\3 ' k
  (no debugging symbols found)...(no debugging symbols found)...
  Program received signal SIGSEGV, Segmentation fault.
  0x400c9ad5 in re_exec () from /lib/libc.so.6
  (gdb)

But note that if you rebuild grep by running
`configure --with-included-regex' the resulting binary doesn't segfault.
It doesn't find any matches, either.

The same thing happens if I link grep with the
very latest regex code from glibc's CVS repository.

Attachment: pgpjP0yAK3hob.pgp
Description: PGP signature


Reply to: