[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Off topic question about grep



Hello everyone!

I ran into a strange issue with grep and I was hoping someone could
explain what I feel is an oddity.

I was trying to match a word that starts with either a _ or a letter
followed by any number of _, letters, or numbers. (eg: Good = Asdf1,
_aSD1. Bad: 9_asD ). My test text file is just those three examples,
each on a new line.

I first tested with this:
[_a-zA-Z][_a-zA-Z0-9]

But that would match against 9_asD which begins with a number (not what
I wanted). So I tried:
[_a-zA-Z][_a-zA-Z0-9]*

I realize that the expression won't do what I mistakenly thought I
wanted it to do. What is puzzling to me is that my hard disk usage
peaked, my cpu jumped, and grep took almost two minutes to return an
exit code of 1 (no match). :-/

At first I thought it may be an issue with Debian Squeeze (current box)
so I tried it on Debian Lenny with similar results. Same for an Ubuntu
Lucid and Fedora 10. So I am pretty sure it is something with grep and
not just the version of grep.

I was hoping someone might know why grep behaves so oddly with that
expression. If it was a monster file or something I could understand
the system utilization peak, but it is just three lines in a text file.

Just so you know, I have a working solution. In my case, every instance
is on a new line so I have a working expression using:
^[_a-zA-Z][_a-zA-Z0-9]*$

I am just curious about the odd behavior.

Thanks!

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: