Bug#570929: Hungarian locale: "zs" is treated as a single letter, with undesirable consequences
Hi,
I have no clue about the rest of these, but
Andras Korn wrote:
> On Mon, Feb 22, 2010 at 11:07:21AM +0100, Andras Korn wrote:
>> 2. "zs" is the last letter of the Hungarian alphabet; therefore, no sane
>> character range in a regular expression can include it ("[a-zs]" would be
>> ambiguous because there isn't a "zs" glyph).
Would [a-[.zs.]] work?
See
http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
Lots of the behavior of regular expressions in non-C locales is
counterintuitive, so it might be helpful to point out if each example
violates some rule of the standard or only common sense (both are
important, of course).
> The problem also affects sed(1) similarly:
>
> % echo azsa | LANG=hu_HU.UTF-8 sed -n "/^a[^a-z]a$/p"
> azsa
sed uses re_compile_pattern() and so on from glibc (same maintainers
as locales). I don’t know if grep does also.
Hope that helps,
Jonathan
Reply to: