Bash, sed: extracting regex subexpressions

To: "Debian user" <debian-user@lists.debian.org>
Subject: Bash, sed: extracting regex subexpressions
From: "John O'Hagan" <johnmohagan@gmail.com>
Date: Tue, 27 May 2008 14:49:59 +0000
Message-id: <[🔎] 7baa87df0805270749y528498bas672233931eea4e8c@mail.gmail.com>

Hi,

I've been looking for a command I can use in bash scripts that will do
something like this:

$COMMAND(n[,m...]) (REGEX-1)(REGEX-2)[...] <($FILE)

        (MATCH-n)[(MATCH-m)...]

In other words, to output only the parts of a regular expression match which
match specified subexpressions.

As a trivial example:

ifconfig | $COMMAND(2) '(inet addr:)([^ ]+)( .*)'

        192.168.1.10

Some invocations of grep, awk and sed use backreferences, but AFAIK you can't
get just the backreferences as output. It would be simple if grep -o could have
subexpression indices, like:

egrep -o(2) '(foo)(.*)(bar)'

to get the matches for (.*); or if awk did something like this:

mawk '/(foo)(.*)(bar)/ {print \2}'

in other words, treating backreferences as pseudo-variables, but it doesn't,
AFAIK.

What I wanted can be done with grep plus sed, or multiple greps, or
awk using regexes as field separators, etc. but I wondered if there was a
neat way to do it with one command and without having to repeat regexes. It's
something that comes up from time to time in admin scripts and I've seen
posts here and there asking this kind of question.

Anyway, I found one:

sed -nr 's/(foo)(.*)(bar)/\2/p'

The -n stops the lines which don't match the regex from being printed,
backreferences in the replacement let you choose subexpressions and the p
flag at the end prints them.

Not as neat as the imaginary grep or awk features above, because you
have to match the whole line, even what you don't want, but wildcards
make that possible.

Any better ideas?

John

Reply to:

Follow-Ups:
- Re: Bash, sed: extracting regex subexpressions
  - From: "Javier Barroso" <javibarroso@gmail.com>
- Re: Bash, sed: extracting regex subexpressions
  - From: "Todd A. Jacobs" <nospam@codegnome.org>

Prev by Date: Re: Wildly OT: Re: apt-get joke
Next by Date: Re: Wildly OT: Re: apt-get joke
Previous by thread: Re: Debian Etch will not boot after "aptitude upgrade"
Next by thread: Re: Bash, sed: extracting regex subexpressions
Index(es):
- Date
- Thread