Re: Off topic question about grep
On 20101109_071001, ~Stack~ wrote:
> Hello everyone!
>
> I ran into a strange issue with grep and I was hoping someone could
> explain what I feel is an oddity.
>
> I was trying to match a word that starts with either a _ or a letter
> followed by any number of _, letters, or numbers. (eg: Good = Asdf1,
> _aSD1. Bad: 9_asD ). My test text file is just those three examples,
> each on a new line.
>
> I first tested with this:
> [_a-zA-Z][_a-zA-Z0-9]
>
> But that would match against 9_asD which begins with a number (not what
> I wanted). So I tried:
> [_a-zA-Z][_a-zA-Z0-9]*
>
> I realize that the expression won't do what I mistakenly thought I
> wanted it to do. What is puzzling to me is that my hard disk usage
> peaked, my cpu jumped, and grep took almost two minutes to return an
> exit code of 1 (no match). :-/
>
> At first I thought it may be an issue with Debian Squeeze (current box)
> so I tried it on Debian Lenny with similar results. Same for an Ubuntu
> Lucid and Fedora 10. So I am pretty sure it is something with grep and
> not just the version of grep.
>
> I was hoping someone might know why grep behaves so oddly with that
> expression. If it was a monster file or something I could understand
> the system utilization peak, but it is just three lines in a text file.
>
> Just so you know, I have a working solution. In my case, every instance
> is on a new line so I have a working expression using:
> ^[_a-zA-Z][_a-zA-Z0-9]*$
This last expression anchors the expression to the beginning of a line.
To anchor an expression to the beginning of a word you need:
\<[_a-zA-Z][_a-zA-Z0-9]*$
but this will only work if you agree with the implementers of grep as to what
it is that defines the beginning of a word. What is your definition?
Look in 'man grep' for clues as to where you can find the official
grep implmenters definition. I found '\<' in 'man grep' under
'The Backslash Character and Special Expressions'
HTH
--
Paul E Condon
pecondon@mesanetworks.net
Reply to: