[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#181872: Patch



On Thu, Mar 13, 2003 at 07:58:05PM +0100, Josip Rodin wrote:
> On Thu, Mar 13, 2003 at 06:46:35PM +0100, Frank Lichtenheld wrote:
> > > The right fix would be simply
> > > $long_desc =~ s,<(?:URL:\s*)?(http://[^>]+)\s*>,\&lt\;$1\&gt\;,go;
> > > 
> > > Right?
> > 
> > Yours would do also. The main difference in result is that you delete
> > the 'URL:' while mine preserves it. Only a cosmetic difference.
> 
> Actually I did that off the top of my head, focusing on the [^>] part.
> I thought that the "URL:" part was included in the anchor, but I guess
> that's handled by some other part of the code.

Ok. Let's elaborate a little. Sorry if it's too long.

$long_desc =~ s,<((URL:)?http://[\S~-]+?/?)>,\&lt\;$1\&gt\;,go;
                 ^^    ^ ^                ^
                 12    2 X                1

That's the original regex. Included is the first match and so all
that's matched beetween '(' 1 and ')' 1. The problem in the bug was 
that at point X was no whitespace allowed, so I inserted \s* at this place.

$long_desc =~ s,<((URL:)?\s*http://[\S~-]+?/?)>,\&lt\;$1\&gt\;,go;
                         ^^^

In your regex

$long_desc =~ s,<(?:URL:\s*)?(http://[^>]+)\s*>,\&lt\;$1\&gt\;,go;
                 ^         ^ ^            ^ ^
                 1         1 2            2 Y

only what's beetween '(' 2 and ')' 2 is included (because of the '?:'
modifier in the first parantheses). So the 'URL:' is discarded.
Wether you write [\S~-]+?> or [^>]+> should make no big difference (you
are allowing more chars), especially because the first one is a
non-greedy match. The \s* at Y is a good addition by you.

> > > > +               $long_desc =~ s/\&/\&amp\;/go;
> The problem is that if someone puts a proper &amp; in a URL, your regexp
> would happily convert it to &amp;amp; :)

But why would someone do this? The main place where a long description
is displayed is a package manager (dselect/aptitude) not a website. I
would consider this a bug in the package, not in the code.
But if you want to really allow this you have to write something like:
   $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|\w+)\;)/\&amp\;/go;

Seems to work good but no warranty. Happy regexing ;)

Greetings,
	Frank

-- 
*** Frank Lichtenheld <frank@lichtenheld.de> ***
          *** http://www.djpig.de/ ***
see also: - http://www.usta.de/
          - http://fachschaft.physik.uni-karlsruhe.de/



Reply to: