[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#216512: libc6: incomplete multibyte sequences cause segfault in regexec()



Package: libc6
Version: 2.3.2.ds1-11
Severity: normal
Followup-For: Bug #216512

I ended up with a similar problem due to this bug.  I orignially tried
to report the bug using GNU's bug tracker, but since it's been defunct
for over a week now, I'll attach my bug report here.

I believe Anders's patch will fix the problem (my test program crashes
at the same point), although I haven't tested it because I don't want
to sit through a glibc compile right now.

Here's my GNU bug report:
>Submitter-Id:  net
>Originator:    Ben Winslow
>Organization:
>Confidential:  no
>Synopsis:      regexec() causes SIGSEGV with invalid multibyte string
>Severity:      serious
>Priority:      medium
>Category:      libc
>Class:         sw-bug
>Release:       libc-2.3.2
>Environment:

Host type: i386-pc-linux-gnu
System: Linux portal 2.6.1 #2 Thu Jan 29 01:34:38 EST 2004 i686
GNU/Linux
Architecture: i686

Addons: linuxthreads
Build CFLAGS: -g -O2
Build CC: gcc-3.3
Compiler version: 3.3.3 20031229 (prerelease) (Debian)
Kernel headers: UTS_RELEASE
Symbol versioning: yes
Build static: yes
Build shared: yes
Build pic-default: no
Build profile: yes
Build omitfp: no
Build bounded: no
Build static-nss: no

>Description:

regexec() (in find_collation_sequence_value) causes a segmentation
violation in  when an invalid multibyte string is passed in the 'string'
parameter.

Backtrace:
#0  0x400d847a in find_collation_sequence_value (mbs=0x805f170 "\xC2\xB8\xEF\xBF\xBD\bZ]TEST",
    mbs_len=2) at regexec.c:3644
#1  0x400d8223 in check_node_accept_bytes (preg=0xbffff420, node_idx=1016740,
    input=0x0, str_idx=1) at regexec.c:3534
#2  0x400d5ca4 in transit_state_mb (preg=0xbffff420, pstate=0x805f580,
    mctx=0xbffff124) at regexec.c:2305
#3  0x400d596e in transit_state (err=0xbffff0c8, preg=0xbffff420,
    mctx=0xbffff124, state=0x805f580, fl_search=0) at regexec.c:2067
#4  0x400d3951 in check_matching (preg=0xbffff420, mctx=0xbffff124,
    fl_search=0, fl_longest_match=0) at regexec.c:1009
#5  0x400d3193 in re_search_internal (preg=0xbffff420,
    string=0x804873f "\xEF\xBF\xBD\xC2\xB8\xEF\xBF\xBD", length=4, start=0, range=4, stop=1016740,
    nmatch=0, pmatch=0x0, eflags=0) at regexec.c:744
#6  0x400d2701 in __regexec (preg=0xbffff420, string=0x804873f "\xEF\xBF\xBD\xC2\xB8\xEF\xBF\xBD",
    nmatch=1016740, pmatch=0xf83a4, eflags=0) at regexec.c:221
#7  0x08048592 in main ()


>How-To-Repeat:

Set your locale to en_US.UTF-8.
Build and execute the following code:

------------------------------ 8< ------------------------------
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <locale.h>

int main(int argc, char *argv[])
{
        regex_t expression;
        char errbuf[512];
        int error;

        setlocale(LC_ALL, "");

        if ((error = regcomp(&expression, "[^a-z]test", REG_EXTENDED | REG_ICASE)) != 0) {
                regerror(error, &expression, errbuf, sizeof(errbuf));
                fprintf(stderr, "regexp compilation failed: %s\n", errbuf);
                return 1;
        }

        printf("regexec: %d\n", regexec(&expression, "\xe2\xc2\xb8\xe2", 0, NULL, 0));

        regfree(&expression);

        return 0;
}

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.6.1
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8

Versions of packages libc6 depends on:
ii  libdb1-compat                 2.1.3-7    The Berkeley database routines [gl

-- no debconf information




Reply to: