[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bash, sed: extracting regex subexpressions



> On Tue, May 27, 2008 at 4:49 PM, John O'Hagan <johnmohagan@gmail.com> wrote:
> > Hi,
> >
> > I've been looking for a command I can use in bash scripts that will do
> > something like this:
> >
> > $COMMAND(n[,m...]) (REGEX-1)(REGEX-2)[...] <($FILE)
> >
> >        (MATCH-n)[(MATCH-m)...]
>

Thanks for the tips; they all work. 

I tried each approach for a time-intensive task: finding palindromes within 
words in a dictionary file $DICT, using an identical regex in each case. 
Below are the expressions used and the times they took to execute:

while read i ; do

[[ $i =~ '(.*((.)(.?)((.)\6?)\4\3).*)' ]] && echo $BASH_REMATCH  
${BASH_REMATCH[2]}

done < $DICT

#real    1m41.239s
#user    1m17.383s
#sys     0m0.474s

--------


sed -nr 's/(.*((.)(.?)((.)\6?)\4\3).*)/\1 \2/p' $DICT

#real    1m6.151s
#user    0m46.763s
#sys     0m0.151s

-------


 perl -ne '$_ =~ /(.*((.)(.?)((.)\6?)\4\3).*)/; print "$1, $2\n"' < $DICT


#real    0m16.381s
#user    0m4.660s
#sys     0m0.482s

--------

So I guess Perl is way the winner; unless the above comparison is somehow 
unfair?

Regards,

John


Reply to: