[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#254314: narrowing down the bug



package libc6
retitle 254314 regexec(): segfaults in UTF-8 locales under some circumstances
thanks

Hi,

  the attached C program reproduces the bug in an UTF-8 locale. These
  are the circumstances under which the bug seems to get triggered:

    - the locale must be UTF-8.
    - an invalid utf-8 string must be given as input.
    - the regular expression must be compiled with the REG_ICASE flag.
    - the regular expression must contain a range, e.g. "[0-1]".
      Oddly (for me), "[a-b]" doesn't trigger the bug, but "[a-b]+"
      does. I guess is just a matter of forcing a call to
      find_collation_sequence_value().

  If this is "expected behavior" and callers are expected to give always
  *valid* input data, please reassign back to mutt.

  cheers,

-- 
Adeodato Simó
    EM: asp16 [ykwim] alu.ua.es | PK: DA6AE621
 
Arguing with an engineer is like wrestling with a pig in mud: after a
while, you realize the pig is enjoying it.
/*
 * Little C program to reproduce Debian bug #254314.
 * Tested with LANG=es_ES.UTF-8 and libc6 version 2.3.2.ds1-13.
 *
 * Adeodato Simó <asp16@alu.ua.es>, 2004-06-15, public domain.
 */

#include <stdio.h>
#include <regex.h>
#include <locale.h>

int
main (void)
{
    regex_t preg;

    char *r = "[0-1]"; /* also [a-b]+ */
    char *s = "\xc3\xa1\xc3"; /* Invalid UTF-8 string! */

    setlocale(LC_ALL, "");

    if (regcomp(&preg, r, 0) == 0) /* ! REG_ICASE */
    {
      regexec(&preg, s, 0, NULL, 0);
      printf("%s\n", "case-sensitive: successful");
    }

    if (regcomp(&preg, r, REG_ICASE) == 0)
    {
      regexec(&preg, s, 0, NULL, 0);
      printf("%s\n", "case-insensitive: successful"); /* not reached for UTF-8 locales */
    }

    return 0;
}

Reply to: