Control: reassign -1 haskell-regex-posix Control: forwarded -1 TextRegexLazy@personal.mightyreason.com Control: tag + upstream Hi, Am Dienstag, den 06.05.2014, 09:25 +0200 schrieb Aurelien Jarno: > What you see is actually very likely locale related. The "\242" > character is not valid in unicode locale. If you run your code using a > unicode locale, as regcomp() and regexec() interpret the regex and the > string as unicode, the "\242" character is ignored. > > The behavior you describe can be reproduced in you C example by adding > a call to setlocale(LC_ALL, "C.UTF-8") at the beginning of your code. Thanks! Indeed observable on the Haskell level: Prelude System.Locale.SetLocale Text.Regex> setLocale LC_ALL (Just "C.UTF-8") Just "C.UTF-8" Prelude System.Locale.SetLocale Text.Regex> let s = "ò" Prelude System.Locale.SetLocale Text.Regex> matchRegex (mkRegex $ "^.*$") s Nothing Prelude System.Locale.SetLocale Text.Regex> setLocale LC_ALL (Just "C") Just "C" Prelude System.Locale.SetLocale Text.Regex> matchRegex (mkRegex $ "^.*$") s Just [] The problems seems to be that Text.Regex.Posix uses newCAString and not newCString. The latter converts the Haskell unicode string to the binary representation, and if I do it by hand, I get: Prelude Foreign.C.String Text.Regex.Posix.Wrap> cs <- newCString "\242" Prelude Foreign.C.String Text.Regex.Posix.Wrap> cp <- newCString "." Prelude Foreign.C.String Text.Regex.Posix.Wrap> Right r2 <- wrapCompile 0 0 cp Prelude Foreign.C.String Text.Regex.Posix.Wrap> wrapTest r2 cs Right True (Compare that with what I did in https://bugs.debian.org/702617#22) > I am therefore tempted to reassign the bug back to > libghc-regex-compat-dev. Do you agree? Yes, just done that. Dear Christopher: Is there a good reason why regex-posix uses newCAString, and not newCString, when converting Haskell Stings to C strings? Thanks, Joachim Greetings, Joachim -- Joachim "nomeata" Breitner Debian Developer nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata
Attachment:
signature.asc
Description: This is a digitally signed message part