[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: egrep oddity

On Mon, 06 Feb 2012 00:15:14 +0100, Tomas Volka wrote:

> On Ne 05-02-12 | 23:04, Sven Joachim wrote:


>> > The "^[A-Z]" range will never match line beginning with a, since the
>> > range matches only uppercase characters.
>> Not quite true, this very much depends on the locale.
>> Sven
> Tried this under cs_CZ.UTF-8 and C locales and it behaves as i outlined.
> I'm curious under which locale is the result different, as i've never
> experienced such behavior.

"man egrep" (Character Classes and Bracket Expressions) seems to agree 
with Sven's assertion although it does not specify the differences 
between specific locales.

For example, in the default C locale, [a-d] is equivalent to
[abcd]. Many locales sort characters in  dictionary  order,  and  in  
these locales [a-d] is typically not equivalent to [abcd]; it might be 
equivalent to [aBbCcDd], for example.  To  obtain  the  traditional  
interpretation of bracket  expressions,  you  can  use  the  C  locale  
by  setting the LC_ALL environment variable to the value C.



Reply to: