[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: grep / sed + regex : possible bug ?



On Thu, Mar 20, 2003 at 09:20:08AM +0100, Axel Schlicht wrote:
> Hi Colin
> That makes it way clearer now.
> 
> So I am safe to assume that the regexes used in sed / grep et. al. are
> quite different from those encountered in automata theorie?

No, they're pretty similar if you ignore the minor syntactic
differences. The ones in automata theory are typically just forced to
match over the whole input (or implicitly anchored to the start and end,
if you prefer to think about it that way), that's all. POSIX regular
expressions are more flexible in that you can do that if you like, but
you don't have to.

> > You've definitely misunderstood how unanchored regular expressions work.
> > In general, tools that handle regular expressions do *not* require them
> > to match all the input, so your "state invalid" actually means "we ran
> > off the end of the regular expression before we ran out of input, but
> > that's OK".
> 
> So in fact it (my regex) is taken as
> .*/Name/[^/][^/]*.*
> (note the additional .*) as long as I don't and quite clearly say it has
> to stop, i. e. add an anchor.

No, not quite. You're still hung up on the idea that the regex is always
matched over the whole input. This is not so. Implicit addition of .* at
either end produces a different regex; you'll see the difference when
you try to use it in an s/// sed command, for example.
s|/Name/[^/][^/]*|| (using s||| for clarity to avoid having to backslash
all the literal / characters) will delete only "/Name/whatever", while
s|.*/Name/[^/][^/]*.*|| will delete the whole string, at least up to a
newline.

Cheers,

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Reply to: