[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: .*? still greedy (perl,python)?



Marco Pantaleoni wrote:
> 
> On Wed, Aug 30, 2000 at 02:16:36PM +0300, joost witteveen wrote:
> >
> > Hi,
> > I noticed that .*? in perl and phython, if used at the beginning
> > of an unanchored (without ^) regular expression still behaves half-greedy:
> >
> > perl:
> >   "hello" =~ /(.*?)l/;
> >   print "\$1 = $1\n";
> >
> > python:
> >   import re
> >   r=re.search("(.*?)l","hello")
> >   print "$1=",r.group(1)
> >
> > In both cases, "he" is printed, while the real minimal match obviously
> > would result in an emtpy $1 or r.group(1). This is the behavour
> > if I put the (.*?) at the end (ex.: /h(.*?)/)
> 
> Why ? The regexp you are using impose the constraint of having an 'l'
> following the parentesized subexpression. Only "he" satisfies this
> constraint.
> 
I was going to say that as well, but I can see where he's coming from. 
Actually, matching against just the 'l' *would* satisfy the constraint,
since '*' can match zero characters.

Here's perldoc perlre:

"By default, a quantified subpattern is ``greedy'', that is, it will
match as
many times as possible (given a particular starting location) while
still allowing
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
the rest of the pattern to match. If you want it to match the minimum
number of times possible, follow the quantifier with a ``?''. "

And that's the answer: non-greediness doesn't mean the search is going
to try different *starting* locations to get a more minimal match, once
it finds a match that satisfies.  I.e., greediness (or lack thereof)
goes rightward, not leftward.



Reply to: