[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#702617: regex /./ fails to match certiain characters



Control: reassign -1 haskell-regex-posix
Control: forwarded -1 TextRegexLazy@personal.mightyreason.com
Control: tag + upstream

Hi,

Am Dienstag, den 06.05.2014, 09:25 +0200 schrieb Aurelien Jarno:
> What you see is actually very likely locale related. The "\242"
> character is not valid in unicode locale. If you run your code using a
> unicode locale, as regcomp() and regexec() interpret the regex and the
> string as unicode, the "\242" character is ignored.
> 
> The behavior you describe can be reproduced in you C example by adding 
> a call to setlocale(LC_ALL, "C.UTF-8") at the beginning of your code.

Thanks! Indeed observable on the Haskell level:

Prelude System.Locale.SetLocale Text.Regex> setLocale LC_ALL (Just "C.UTF-8")
Just "C.UTF-8"
Prelude System.Locale.SetLocale Text.Regex> let s = "ò"
Prelude System.Locale.SetLocale Text.Regex> matchRegex (mkRegex $ "^.*$") s
Nothing
Prelude System.Locale.SetLocale Text.Regex> setLocale LC_ALL (Just "C")
Just "C"
Prelude System.Locale.SetLocale Text.Regex> matchRegex (mkRegex $ "^.*$") s
Just []

The problems seems to be that Text.Regex.Posix uses newCAString and not
newCString. The latter converts the Haskell unicode string to the binary
representation, and if I do it by hand, I get:

Prelude Foreign.C.String Text.Regex.Posix.Wrap> cs <- newCString "\242"
Prelude Foreign.C.String Text.Regex.Posix.Wrap> cp <- newCString "."
Prelude Foreign.C.String Text.Regex.Posix.Wrap> Right r2 <- wrapCompile 0 0 cp
Prelude Foreign.C.String Text.Regex.Posix.Wrap> wrapTest r2 cs
Right True

(Compare that with what I did in https://bugs.debian.org/702617#22)

> I am therefore tempted to reassign the bug back to
> libghc-regex-compat-dev. Do you agree?

Yes, just done that.

Dear Christopher: Is there a good reason why regex-posix uses
newCAString, and not newCString, when converting Haskell Stings to C
strings?

Thanks,
Joachim


Greetings,
Joachim

-- 
Joachim "nomeata" Breitner
Debian Developer
  nomeata@debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nomeata@joachim-breitner.de | http://people.debian.org/~nomeata

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: