[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences



On Tue, Feb 23, 2010 at 10:29:25PM -0600, Jonathan Nieder wrote:

> >> 2. "zs" is the last letter of the Hungarian alphabet; therefore, no sane
> >> character range in a regular expression can include it ("[a-zs]" would be
> >> ambiguous because there isn't a "zs" glyph).
> 
> Would [a-[.zs.]] work?

̈́No, because apparently [.zs.] isn't a valid collating element:

% echo azsa | LANG=hu_HU.UTF-8 grep "^a[a-[.zs.]]a$"
grep: Invalid collation character

> See
> http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

That was helpful, thanks - I didn't know about collating elements in REs.

> Lots of the behavior of regular expressions in non-C locales is
> counterintuitive, so it might be helpful to point out if each example
> violates some rule of the standard or only common sense (both are
> important, of course).

Uh, that standard is too dense for me; I'll pass on that and can only vouch
for common sense.

Andras

-- 
 Andras Korn <korn at elan.rulez.org> - <http://chardonnay.math.bme.hu/~korn/>
                    My new year's resolution is 1920x1080.



Reply to: