Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
On Tue, Feb 23, 2010 at 10:29:25PM -0600, Jonathan Nieder wrote:
> >> 2. "zs" is the last letter of the Hungarian alphabet; therefore, no sane
> >> character range in a regular expression can include it ("[a-zs]" would be
> >> ambiguous because there isn't a "zs" glyph).
> Would [a-[.zs.]] work?
̈́No, because apparently [.zs.] isn't a valid collating element:
% echo azsa | LANG=hu_HU.UTF-8 grep "^a[a-[.zs.]]a$"
grep: Invalid collation character
That was helpful, thanks - I didn't know about collating elements in REs.
> Lots of the behavior of regular expressions in non-C locales is
> counterintuitive, so it might be helpful to point out if each example
> violates some rule of the standard or only common sense (both are
> important, of course).
Uh, that standard is too dense for me; I'll pass on that and can only vouch
for common sense.
Andras Korn <korn at elan.rulez.org> - <http://chardonnay.math.bme.hu/~korn/>
My new year's resolution is 1920x1080.