Re: Bash, sed: extracting regex subexpressions

To: Debian user <debian-user@lists.debian.org>
Subject: Re: Bash, sed: extracting regex subexpressions
From: John O'Hagan <research@johnohagan.com>
Date: Wed, 28 May 2008 09:23:11 +0000
Message-id: <[🔎] 200805280923.13102.research@johnohagan.com>
In-reply-to: <[🔎] 81c921f30805270827u57563b9ag35af7d6b25a89fa7@mail.gmail.com>
References: <[🔎] 7baa87df0805270749y528498bas672233931eea4e8c@mail.gmail.com> <[🔎] 81c921f30805270827u57563b9ag35af7d6b25a89fa7@mail.gmail.com>

> On Tue, May 27, 2008 at 4:49 PM, John O'Hagan <johnmohagan@gmail.com> wrote:
> > Hi,
> >
> > I've been looking for a command I can use in bash scripts that will do
> > something like this:
> >
> > $COMMAND(n[,m...]) (REGEX-1)(REGEX-2)[...] <($FILE)
> >
> >        (MATCH-n)[(MATCH-m)...]
>

Thanks for the tips; they all work. 

I tried each approach for a time-intensive task: finding palindromes within 
words in a dictionary file $DICT, using an identical regex in each case. 
Below are the expressions used and the times they took to execute:

while read i ; do

[[ $i =~ '(.*((.)(.?)((.)\6?)\4\3).*)' ]] && echo $BASH_REMATCH  
${BASH_REMATCH[2]}

done < $DICT

#real    1m41.239s
#user    1m17.383s
#sys     0m0.474s

--------

sed -nr 's/(.*((.)(.?)((.)\6?)\4\3).*)/\1 \2/p' $DICT

#real    1m6.151s
#user    0m46.763s
#sys     0m0.151s

-------

 perl -ne '$_ =~ /(.*((.)(.?)((.)\6?)\4\3).*)/; print "$1, $2\n"' < $DICT

#real    0m16.381s
#user    0m4.660s
#sys     0m0.482s

--------

So I guess Perl is way the winner; unless the above comparison is somehow 
unfair?

Regards,

John

Reply to:

References:
- Bash, sed: extracting regex subexpressions
  - From: "John O'Hagan" <johnmohagan@gmail.com>
- Re: Bash, sed: extracting regex subexpressions
  - From: "Javier Barroso" <javibarroso@gmail.com>

Prev by Date: Re: dot in folder name in Courier or Dovecot
Next by Date: Re: Wildly OT: Re: apt-get joke
Previous by thread: Re: Bash, sed: extracting regex subexpressions
Next by thread: Re: Bash, sed: extracting regex subexpressions
Index(es):
- Date
- Thread