Bug#181872: Patch
On Thu, Mar 13, 2003 at 09:27:28PM +0100, Frank Lichtenheld wrote:
> > > > The right fix would be simply
> > > > $long_desc =~ s,<(?:URL:\s*)?(http://[^>]+)\s*>,\<\;$1\>\;,go;
> > > >
> > > > Right?
> > >
> > > Yours would do also. The main difference in result is that you delete
> > > the 'URL:' while mine preserves it. Only a cosmetic difference.
> >
> > Actually I did that off the top of my head, focusing on the [^>] part.
> > I thought that the "URL:" part was included in the anchor, but I guess
> > that's handled by some other part of the code.
>
> Ok. Let's elaborate a little. Sorry if it's too long.
Oh, I understood perfectly what you said, I just meant to say that I thought
the original code preserved URL: within the <a> tag by mistake.
> Wether you write [\S~-]+?> or [^>]+> should make no big difference (you
> are allowing more chars), especially because the first one is a non-greedy
> match.
Well, I think in principle it's much better to just match until the first
closing bracket since IME such things are less prone to errors. Of course,
if someone found URLs with <> in them, that idea goes down the drain...
> But why would someone do this? The main place where a long description
> is displayed is a package manager (dselect/aptitude) not a website. I
> would consider this a bug in the package, not in the code.
Er, if the right thing to do in HTML is to encode the ampersands, I wouldn't
expect it to be a bug to encode it everywhere.
> But if you want to really allow this you have to write something like:
> $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|\w+)\;)/\&\;/go;
> Seems to work good but no warranty. Happy regexing ;)
Not sure offhand why you both check the entity format and use a rather
simple \w+ as an alternative... A sentence could end talking about
Barnes&Noble; and then it could be followed by another sentence :)
--
2. That which causes joy or happiness.
Reply to: