[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bash, sed: extracting regex subexpressions



Hi,

I've been looking for a command I can use in bash scripts that will do
something like this:

$COMMAND(n[,m...]) (REGEX-1)(REGEX-2)[...] <($FILE)

        (MATCH-n)[(MATCH-m)...]

In other words, to output only the parts of a regular expression match which
match specified subexpressions.

As a trivial example:

ifconfig | $COMMAND(2) '(inet addr:)([^ ]+)( .*)'

        192.168.1.10

Some invocations of grep, awk and sed use backreferences, but AFAIK you can't
get just the backreferences as output. It would be simple if grep -o could have
subexpression indices, like:

egrep -o(2) '(foo)(.*)(bar)'

to get the matches for (.*); or if awk did something like this:

mawk '/(foo)(.*)(bar)/ {print \2}'

in other words, treating backreferences as pseudo-variables, but it doesn't,
AFAIK.

What I wanted can be done with grep plus sed, or multiple greps, or
awk using regexes as field separators, etc. but I wondered if there was a
neat way to do it with one command and without having to repeat regexes. It's
something that comes up from time to time in admin scripts and I've seen
posts here and there asking this kind of question.

Anyway, I found one:

sed -nr 's/(foo)(.*)(bar)/\2/p'

The -n stops the lines which don't match the regex from being printed,
backreferences in the replacement let you choose subexpressions and the p
flag at the end prints them.

Not as neat as the imaginary grep or awk features above, because you
have to match the whole line, even what you don't want, but wildcards
make that possible.

Any better ideas?

John


Reply to: