[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: egrep oddity



On Mon, 06 Feb 2012 00:15:14 +0100, Tomas Volka wrote:

> On Ne 05-02-12 | 23:04, Sven Joachim wrote:

(...)

>> > The "^[A-Z]" range will never match line beginning with a, since the
>> > range matches only uppercase characters.
>> 
>> Not quite true, this very much depends on the locale.
>> 
>> Sven
> 
> Tried this under cs_CZ.UTF-8 and C locales and it behaves as i outlined.
> I'm curious under which locale is the result different, as i've never
> experienced such behavior.

"man egrep" (Character Classes and Bracket Expressions) seems to agree 
with Sven's assertion although it does not specify the differences 
between specific locales.

***
For example, in the default C locale, [a-d] is equivalent to
[abcd]. Many locales sort characters in  dictionary  order,  and  in  
these locales [a-d] is typically not equivalent to [abcd]; it might be 
equivalent to [aBbCcDd], for example.  To  obtain  the  traditional  
interpretation of bracket  expressions,  you  can  use  the  C  locale  
by  setting the LC_ALL environment variable to the value C.
***

Greetings,

-- 
Camaleón


Reply to: