[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: match across line using grep



On 08/03/2010 11:28 AM, Andre Majorel wrote:
> On 2010-08-03 09:57 -0700, Bob McGowan wrote:
>> On 08/03/2010 05:39 AM, Andre Majorel wrote:
>>> On 2010-08-03 19:37 +0800, Zhang Weiwu wrote:
>>>> On 2010???08???03??? 17:53, Andre Majorel wrote:
>>>>>>> $ printf 'a\nb' | grep -zo a.*b
>>>>>>>

<--deleted-->

> Fun, eh ? Maybe the answer is in there :
> 
>   $ locale
>   LANG=
>   LC_CTYPE=en_US
>   LC_NUMERIC="POSIX"
>   LC_TIME="POSIX"
>   LC_COLLATE=C
>   LC_MONETARY="POSIX"
>   LC_MESSAGES="POSIX"
>   LC_PAPER="POSIX"
>   LC_NAME="POSIX"
>   LC_ADDRESS="POSIX"
>   LC_TELEPHONE="POSIX"
>   LC_MEASUREMENT="POSIX"
>   LC_IDENTIFICATION="POSIX"
>   LC_ALL=

This does appear to be the "issue".  My settings are:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

> 
>> There's no output.  But change it like this:
>>
>> $ perl -e 'print "a\nb\0"'| grep -z 'a'
>> a
>> b$
>>
>> It found, and printed, the newline containing string.  I would suspect
>> the regex engine is still honoring '. (dot) does not match newline'
>> convention but is OK with literals, if present.
> 

I did a sub-shell and reset all the variables to match yours, and,
bingo, the wildcard worked.

Looking through the list of names, nothing seems 'obvious' as a single
contributor.  In fact, the LC_ names all seem to be specific to things
that would not necessarily impact the regex operation.

So, I picked LANG as a starting point and reset it, *only*, to empty.
And got lucky.  That is, apparently, the variable that affects how the
regex is handled.

-- 
Bob McGowan
Symantec
US Internationalization


Reply to: