Bug#216512: libc6: incomplete multibyte sequences cause segfault in regexec()
Package: libc6
Version: 2.3.2.ds1-11
Severity: normal
Followup-For: Bug #216512
I ended up with a similar problem due to this bug. I orignially tried
to report the bug using GNU's bug tracker, but since it's been defunct
for over a week now, I'll attach my bug report here.
I believe Anders's patch will fix the problem (my test program crashes
at the same point), although I haven't tested it because I don't want
to sit through a glibc compile right now.
Here's my GNU bug report:
>Submitter-Id: net
>Originator: Ben Winslow
>Organization:
>Confidential: no
>Synopsis: regexec() causes SIGSEGV with invalid multibyte string
>Severity: serious
>Priority: medium
>Category: libc
>Class: sw-bug
>Release: libc-2.3.2
>Environment:
Host type: i386-pc-linux-gnu
System: Linux portal 2.6.1 #2 Thu Jan 29 01:34:38 EST 2004 i686
GNU/Linux
Architecture: i686
Addons: linuxthreads
Build CFLAGS: -g -O2
Build CC: gcc-3.3
Compiler version: 3.3.3 20031229 (prerelease) (Debian)
Kernel headers: UTS_RELEASE
Symbol versioning: yes
Build static: yes
Build shared: yes
Build pic-default: no
Build profile: yes
Build omitfp: no
Build bounded: no
Build static-nss: no
>Description:
regexec() (in find_collation_sequence_value) causes a segmentation
violation in when an invalid multibyte string is passed in the 'string'
parameter.
Backtrace:
#0 0x400d847a in find_collation_sequence_value (mbs=0x805f170 "\xC2\xB8\xEF\xBF\xBD\bZ]TEST",
mbs_len=2) at regexec.c:3644
#1 0x400d8223 in check_node_accept_bytes (preg=0xbffff420, node_idx=1016740,
input=0x0, str_idx=1) at regexec.c:3534
#2 0x400d5ca4 in transit_state_mb (preg=0xbffff420, pstate=0x805f580,
mctx=0xbffff124) at regexec.c:2305
#3 0x400d596e in transit_state (err=0xbffff0c8, preg=0xbffff420,
mctx=0xbffff124, state=0x805f580, fl_search=0) at regexec.c:2067
#4 0x400d3951 in check_matching (preg=0xbffff420, mctx=0xbffff124,
fl_search=0, fl_longest_match=0) at regexec.c:1009
#5 0x400d3193 in re_search_internal (preg=0xbffff420,
string=0x804873f "\xEF\xBF\xBD\xC2\xB8\xEF\xBF\xBD", length=4, start=0, range=4, stop=1016740,
nmatch=0, pmatch=0x0, eflags=0) at regexec.c:744
#6 0x400d2701 in __regexec (preg=0xbffff420, string=0x804873f "\xEF\xBF\xBD\xC2\xB8\xEF\xBF\xBD",
nmatch=1016740, pmatch=0xf83a4, eflags=0) at regexec.c:221
#7 0x08048592 in main ()
>How-To-Repeat:
Set your locale to en_US.UTF-8.
Build and execute the following code:
------------------------------ 8< ------------------------------
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <locale.h>
int main(int argc, char *argv[])
{
regex_t expression;
char errbuf[512];
int error;
setlocale(LC_ALL, "");
if ((error = regcomp(&expression, "[^a-z]test", REG_EXTENDED | REG_ICASE)) != 0) {
regerror(error, &expression, errbuf, sizeof(errbuf));
fprintf(stderr, "regexp compilation failed: %s\n", errbuf);
return 1;
}
printf("regexec: %d\n", regexec(&expression, "\xe2\xc2\xb8\xe2", 0, NULL, 0));
regfree(&expression);
return 0;
}
-- System Information:
Debian Release: testing/unstable
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.6.1
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8
Versions of packages libc6 depends on:
ii libdb1-compat 2.1.3-7 The Berkeley database routines [gl
-- no debconf information
Reply to: