[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#216512: marked as done (workaround for libc crashes on incomplete multibyte chars)

Your message dated Sun, 22 Apr 2007 16:53:22 +0200
with message-id <20070422145322.GA11465@volta.aurel32.net>
and subject line Bug#216512: workaround for libc crashes on incomplete multibyte chars
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---
Package: libc6
Version: 2.3.1-16
Severity: normal

Eric Agnew <agnew@geekhive.net> wrote:
> First, a bug report: I'm getting a segfault on grep 2.5.1 when grepping
> the edict file ( http://ftp.cc.monash.edu.au/pub/nihongo/edict.gz ):
>         egrep '^(.)(.)(.)\1\2\3 ' edict
>   or:
>         grep '^\(.\)\(.\)\(.\)\1\2\3 ' edict
> both output 13 lines and the seg fault.  strace didn't seem to tell me
> anything, and I've never been able to figure out gdb, so.. hopefully
> someone will be able to reproduce it..  For reference, I'm running
> Linux (debian/unstable) on x86.

Thanks for the report.
Note that to reproduce the failure you probably have to
be using a UTF-8 locale.  The system I used happened to have
fr_FR.UTF-8 installed, so I used that, even though the data
in that file is in Japanese.

On a system with x86 Linux debian/unstable (grep-2.5.1-4
and libc6-2.3.1-16), I pared it down to this:

  $ printf pMik3KTIpNwK | recode /64 \
    | LC_ALL=fr_FR.UTF-8 /bin/grep -nE '^(.)(.)(.)\1\2\3 '
  Segmentation fault
  [Exit 139 (SIGSEGV)]

This also does it:

  $ grep totteringly edict|LC_ALL=fr_FR.UTF-8 /bin/grep -nE '^(.)(.)(.)\1\2\3 '
  Segmentation fault
  [Exit 139 (SIGSEGV)]

It looks like a problem in libc's re_exec function:

  $ LC_ALL=fr_FR.UTF-8 gdb /bin/grep
  (gdb) r -E '^(.)(.)(.)\1\2\3 ' k
  Starting program: /bin/grep -E '^(.)(.)(.)\1\2\3 ' k
  (no debugging symbols found)...(no debugging symbols found)...
  Program received signal SIGSEGV, Segmentation fault.
  0x400c9ad5 in re_exec () from /lib/libc.so.6

But note that if you rebuild grep by running
`configure --with-included-regex' the resulting binary doesn't segfault.
It doesn't find any matches, either.

The same thing happens if I link grep with the
very latest regex code from glibc's CVS repository.

Attachment: pgp_NvTJjakyX.pgp
Description: PGP signature

--- End Message ---
--- Begin Message ---
Version: 2.3.6.ds1-13

I am able to reproduce the problem with sarge's glibc, but not with the
etch one. I think the bug is fixed, and I am closing it with this mail.

  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

--- End Message ---

Reply to: