Bug#254314: narrowing down the bug
package libc6
retitle 254314 regexec(): segfaults in UTF-8 locales under some circumstances
thanks
Hi,
the attached C program reproduces the bug in an UTF-8 locale. These
are the circumstances under which the bug seems to get triggered:
- the locale must be UTF-8.
- an invalid utf-8 string must be given as input.
- the regular expression must be compiled with the REG_ICASE flag.
- the regular expression must contain a range, e.g. "[0-1]".
Oddly (for me), "[a-b]" doesn't trigger the bug, but "[a-b]+"
does. I guess is just a matter of forcing a call to
find_collation_sequence_value().
If this is "expected behavior" and callers are expected to give always
*valid* input data, please reassign back to mutt.
cheers,
--
Adeodato Simó
EM: asp16 [ykwim] alu.ua.es | PK: DA6AE621
Arguing with an engineer is like wrestling with a pig in mud: after a
while, you realize the pig is enjoying it.
/*
* Little C program to reproduce Debian bug #254314.
* Tested with LANG=es_ES.UTF-8 and libc6 version 2.3.2.ds1-13.
*
* Adeodato Simó <asp16@alu.ua.es>, 2004-06-15, public domain.
*/
#include <stdio.h>
#include <regex.h>
#include <locale.h>
int
main (void)
{
regex_t preg;
char *r = "[0-1]"; /* also [a-b]+ */
char *s = "\xc3\xa1\xc3"; /* Invalid UTF-8 string! */
setlocale(LC_ALL, "");
if (regcomp(&preg, r, 0) == 0) /* ! REG_ICASE */
{
regexec(&preg, s, 0, NULL, 0);
printf("%s\n", "case-sensitive: successful");
}
if (regcomp(&preg, r, REG_ICASE) == 0)
{
regexec(&preg, s, 0, NULL, 0);
printf("%s\n", "case-insensitive: successful"); /* not reached for UTF-8 locales */
}
return 0;
}
Reply to: