[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences



Hi,

I have no clue about the rest of these, but

Andras Korn wrote:
> On Mon, Feb 22, 2010 at 11:07:21AM +0100, Andras Korn wrote:

>> 2. "zs" is the last letter of the Hungarian alphabet; therefore, no sane
>> character range in a regular expression can include it ("[a-zs]" would be
>> ambiguous because there isn't a "zs" glyph).

Would [a-[.zs.]] work?

See
http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

Lots of the behavior of regular expressions in non-C locales is
counterintuitive, so it might be helpful to point out if each example
violates some rule of the standard or only common sense (both are
important, of course).

> The problem also affects sed(1) similarly:
> 
> % echo azsa | LANG=hu_HU.UTF-8 sed -n "/^a[^a-z]a$/p"
> azsa

sed uses re_compile_pattern() and so on from glibc (same maintainers
as locales).  I don’t know if grep does also.

Hope that helps,
Jonathan



Reply to: