[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Regex Question



Telly Williams wrote:
Hi,

	I'm reading up on Regular Expressions and I have a question about alternation.

	I have the sentences: "There was a dog in the house." & "A house on the hill."

	Both of these are in a file (named "regex") on two different lines.

	My understanding of alternation is that it will look both for what is on the right and left of the pipe.

So, if I type:
		grep in|hill regex

	isn't that supposed to give me both sentences in stdout?

	When I type that, I get nothing at all on return.  I've even typed it as:

		grep in\|hill regex

	and I still get nothing.  What am I doing wrong?  Is what I'm typing and what I'm asking for two different things?  Thanks.

TW



Telly,

I've seen responses from 5 people, and feel compelled (sorry about that ;) to add my two cents worth. The answers in the other posts are basically correct but there is more involved in the story than any of them provided, and without the full context, those answers are potentially confusing.

I hope this is helpful for you.

First, as noted by one poster, if you had typed:

    grep in|hill regex.test

to bash, without any shell level escapes, that 'pipe' symbol would do exactly what it's name says. The shell would attempt to open a pipe from grep to a program called 'hill', which (at least on my system) does not exist, resulting in an error: bash: hill: command not found

Second, the 'grep', having no file name, would read the standard input and would "hang", waiting for you to type something in.

Several posters mentioned the need to 'escape' the pipe symbol. This requires exactly one backslash (just as you did) in front of the pipe symbol, to prevent the shell from interpreting it. Though multiple levels of escaping may be needed in some cases, such is not true here. Multiple levels would only be used if you, for example, wanted to search for a literal pipe symbol and you want to be sure it was not treated as either a 'pipe' (shell) or 'alternation' character (grep).

Next, grep supports what most would now call a subset of full regular expression syntax. This does not include parenthesis or alternation. So when you put a backslash in front of the pipe symbol, 'grep' simply looks for the literal string 'in|hill', which does not exist in the file. Add it and you'll see it work.

The -E (not -e) option to grep, or using the name 'egrep', adds extended (or enhanced, but the man page uses the word 'extended') regex functions, as several mentioned. This adds parenthesis as well as alternation, so you can do this 'to(day|morrow)' to look for either 'today' or 'tomorrow' (this is a classic example, borrowed from some book I once read, but which I don't recall title or author;(

Finally, someone mentioned 'fgrep' (stands for "fast" grep, IIRC), which does *not* do regex patterns, it looks for literal matches only. Its power lies in the ability to use many "patterns" (newline separated strings) to search through a list of other newline separated strings. I've used this, for example, to search for a list of file names produced by the output of 'ls', in a file that may have some of those file names in it.

--
Bob McGowan
Symantec, Inc.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


Reply to: