[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#181872: Patch



On Thu, Mar 13, 2003 at 09:27:28PM +0100, Frank Lichtenheld wrote:
> > > > The right fix would be simply
> > > > $long_desc =~ s,<(?:URL:\s*)?(http://[^>]+)\s*>,\&lt\;$1\&gt\;,go;
> > > > 
> > > > Right?
> > > 
> > > Yours would do also. The main difference in result is that you delete
> > > the 'URL:' while mine preserves it. Only a cosmetic difference.
> > 
> > Actually I did that off the top of my head, focusing on the [^>] part.
> > I thought that the "URL:" part was included in the anchor, but I guess
> > that's handled by some other part of the code.
> 
> Ok. Let's elaborate a little. Sorry if it's too long.

Oh, I understood perfectly what you said, I just meant to say that I thought
the original code preserved URL: within the <a> tag by mistake.

> Wether you write [\S~-]+?> or [^>]+> should make no big difference (you
> are allowing more chars), especially because the first one is a non-greedy
> match.

Well, I think in principle it's much better to just match until the first
closing bracket since IME such things are less prone to errors. Of course,
if someone found URLs with <> in them, that idea goes down the drain...

> But why would someone do this? The main place where a long description
> is displayed is a package manager (dselect/aptitude) not a website. I
> would consider this a bug in the package, not in the code.

Er, if the right thing to do in HTML is to encode the ampersands, I wouldn't
expect it to be a bug to encode it everywhere.

> But if you want to really allow this you have to write something like:
>    $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|\w+)\;)/\&amp\;/go;
> Seems to work good but no warranty. Happy regexing ;)

Not sure offhand why you both check the entity format and use a rather
simple \w+ as an alternative... A sentence could end talking about
Barnes&Noble; and then it could be followed by another sentence :)

-- 
     2. That which causes joy or happiness.



Reply to: