[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Off topic question about grep



On 20101109_071001, ~Stack~ wrote:
> Hello everyone!
> 
> I ran into a strange issue with grep and I was hoping someone could
> explain what I feel is an oddity.
> 
> I was trying to match a word that starts with either a _ or a letter
> followed by any number of _, letters, or numbers. (eg: Good = Asdf1,
> _aSD1. Bad: 9_asD ). My test text file is just those three examples,
> each on a new line.
> 
> I first tested with this:
> [_a-zA-Z][_a-zA-Z0-9]
> 
> But that would match against 9_asD which begins with a number (not what
> I wanted). So I tried:
> [_a-zA-Z][_a-zA-Z0-9]*
> 
> I realize that the expression won't do what I mistakenly thought I
> wanted it to do. What is puzzling to me is that my hard disk usage
> peaked, my cpu jumped, and grep took almost two minutes to return an
> exit code of 1 (no match). :-/
> 
> At first I thought it may be an issue with Debian Squeeze (current box)
> so I tried it on Debian Lenny with similar results. Same for an Ubuntu
> Lucid and Fedora 10. So I am pretty sure it is something with grep and
> not just the version of grep.
> 
> I was hoping someone might know why grep behaves so oddly with that
> expression. If it was a monster file or something I could understand
> the system utilization peak, but it is just three lines in a text file.
> 
> Just so you know, I have a working solution. In my case, every instance
> is on a new line so I have a working expression using:
> ^[_a-zA-Z][_a-zA-Z0-9]*$

This last expression anchors the expression to the beginning of a line.
To anchor an expression to the beginning of a word you need:

\<[_a-zA-Z][_a-zA-Z0-9]*$

but this will only work if you agree with the implementers of grep as to what
it is that defines the beginning of a word. What is your definition?

Look in 'man grep' for clues as to where you can find the official
grep implmenters definition. I found '\<' in 'man grep' under 
'The Backslash Character and Special Expressions'

HTH
-- 
Paul E Condon           
pecondon@mesanetworks.net


Reply to: