Re: match across line using grep
On 08/03/2010 11:28 AM, Andre Majorel wrote:
> On 2010-08-03 09:57 -0700, Bob McGowan wrote:
>> On 08/03/2010 05:39 AM, Andre Majorel wrote:
>>> On 2010-08-03 19:37 +0800, Zhang Weiwu wrote:
>>>> On 2010???08???03??? 17:53, Andre Majorel wrote:
>>>>>>> $ printf 'a\nb' | grep -zo a.*b
>>>>>>>
<--deleted-->
> Fun, eh ? Maybe the answer is in there :
>
> $ locale
> LANG=
> LC_CTYPE=en_US
> LC_NUMERIC="POSIX"
> LC_TIME="POSIX"
> LC_COLLATE=C
> LC_MONETARY="POSIX"
> LC_MESSAGES="POSIX"
> LC_PAPER="POSIX"
> LC_NAME="POSIX"
> LC_ADDRESS="POSIX"
> LC_TELEPHONE="POSIX"
> LC_MEASUREMENT="POSIX"
> LC_IDENTIFICATION="POSIX"
> LC_ALL=
This does appear to be the "issue". My settings are:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
>
>> There's no output. But change it like this:
>>
>> $ perl -e 'print "a\nb\0"'| grep -z 'a'
>> a
>> b$
>>
>> It found, and printed, the newline containing string. I would suspect
>> the regex engine is still honoring '. (dot) does not match newline'
>> convention but is OK with literals, if present.
>
I did a sub-shell and reset all the variables to match yours, and,
bingo, the wildcard worked.
Looking through the list of names, nothing seems 'obvious' as a single
contributor. In fact, the LC_ names all seem to be specific to things
that would not necessarily impact the regex operation.
So, I picked LANG as a starting point and reset it, *only*, to empty.
And got lucky. That is, apparently, the variable that affects how the
regex is handled.
--
Bob McGowan
Symantec
US Internationalization
Reply to: