[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: egrep oddity

On 2012-02-06 11:50:16 -0700, Bob Proulx wrote:
> Vincent Lefevre wrote:
> > But the grep man page still says:
> > 
> >   Within a  bracket  expression,  a  range  expression  consists  of  two
> >   characters separated by a hyphen.  It matches any single character that
> >   sorts  between  the  two  characters,  inclusive,  using  the  locale's
> >   collating  sequence  and  character set.  For example, in the default C
> >   locale, [a-d] is equivalent to [abcd].  Many locales sort characters in
> >   dictionary   order,  and  in  these  locales  [a-d]  is  typically  not
> >   equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example.
> >   To  obtain  the  traditional interpretation of bracket expressions, you
> >   can use the C locale by setting the LC_ALL environment variable to  the
> >   value C.
> I don't see any problem with that wording.  The opening for almost any
> behavior comes from "using the locale's collating sequence and
> character set" which isn't defined by grep but is defined by libc.
> Was there something there in particular that you didn't like?

This is precisely because grep no longer follows the locale's
collating sequence. For instance, even though en_US.utf8 uses
the dictionary order (as seen with "sort"), [a-d] is equivalent
to [abcd], not to something that would include B and C.

So, where is the range specified?

Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Reply to: