[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Regex Question



On Fri, Jul 13, 2007 at 15:49:23 -0600, Telly Williams wrote:
> Hi,
> 
> I'm reading up on Regular Expressions and I have a question about
> alternation.
>
> I have the sentences: "There was a dog in the house." & "A house on
> the hill."
>
> Both of these are in a file (named "regex") on two different lines.
>
> My understanding of alternation is that it will look both for what is
> on the right and left of the pipe.
>
> So, if I type: 
>
>	gree in|hill regex
>
> isn't that supposed to give me both sentences in stdout?
> 
> When I type that, I get nothing at all on return.  I've even typed it
> as:
> 
> 	grep in\|hill regex
> 
> and I still get nothing.  What am I doing wrong?  Is what I'm typing
> and what I'm asking for two different things?  Thanks.

The problem is that your shell interprets the special characters "\" and
"|" according to its own rules before it passes the argument on to the
grep command. This means that the matching pattern processed by grep is
not necessarily exactly the string that you have typed.

If you are new to these things then it is probably the best to enclose
your match pattern in single quotes, because this keeps the shell from
interfering. Then you can focus on figuring out the finer points of
regular expressions. This should work:

grep 'in\|hill' regex

(Be sure to use the straight single quotes and not one of these: `´.)

For further illustration, this is how you have to do it in bash without
quotes:

grep in\\\|hill regex

Here's why: The first backslash is interpreted by bash as a special
character which "escapes" the next character. More specifically, the
first backslash tells bash NOT to interpret the second backslash as
special, but rather to pass it on to the command. (It is perfectly
normal to be confused by this in the beginning.) The third backslash
serves the same purpose as the first one, but now it escapes the pipe
"|", which would otherwise be understood by bash as a way to feed the
output of

grep in\\

to the "hill regex" command. (This will just give you an error, of
course.)

With this in mind you can see why your previous attempt failed:

grep in\|hill regex

means that the grep command sees the match pattern "in|hill" which only
matches lines that contain both "in" and "hill" with a literal "|"
between them. 

As others have already pointed out, another pitfall is the difference
between normal and extended regular expressions, i.e. between grep and
egrep.

-- 
Regards,            | http://users.icfo.es/Florian.Kulzer
          Florian   |



Reply to: