Bug#181872: Patch

To: Frank Lichtenheld <frank@lichtenheld.de>, 181872@bugs.debian.org
Subject: Bug#181872: Patch
From: Josip Rodin <joy@srce.hr>
Date: Thu, 13 Mar 2003 22:12:14 +0100
Message-id: <[🔎] 20030313211214.GC6683@prvidomaci.srce.hr>
Reply-to: Josip Rodin <joy@srce.hr>, 181872@bugs.debian.org
In-reply-to: <[🔎] 20030313202728.GC732@djpig.de>
References: <[🔎] 20030313144254.GA732@djpig.de> <[🔎] 20030313172715.GB4218@prvidomaci.srce.hr> <[🔎] 20030313174635.GB732@djpig.de> <[🔎] 20030313185805.GH4218@prvidomaci.srce.hr> <[🔎] 20030313202728.GC732@djpig.de>

On Thu, Mar 13, 2003 at 09:27:28PM +0100, Frank Lichtenheld wrote:
> > > > The right fix would be simply
> > > > $long_desc =~ s,<(?:URL:\s*)?(http://[^>]+)\s*>,\&lt\;$1\&gt\;,go;
> > > > 
> > > > Right?
> > > 
> > > Yours would do also. The main difference in result is that you delete
> > > the 'URL:' while mine preserves it. Only a cosmetic difference.
> > 
> > Actually I did that off the top of my head, focusing on the [^>] part.
> > I thought that the "URL:" part was included in the anchor, but I guess
> > that's handled by some other part of the code.
> 
> Ok. Let's elaborate a little. Sorry if it's too long.

Oh, I understood perfectly what you said, I just meant to say that I thought
the original code preserved URL: within the <a> tag by mistake.

> Wether you write [\S~-]+?> or [^>]+> should make no big difference (you
> are allowing more chars), especially because the first one is a non-greedy
> match.

Well, I think in principle it's much better to just match until the first
closing bracket since IME such things are less prone to errors. Of course,
if someone found URLs with <> in them, that idea goes down the drain...

> But why would someone do this? The main place where a long description
> is displayed is a package manager (dselect/aptitude) not a website. I
> would consider this a bug in the package, not in the code.

Er, if the right thing to do in HTML is to encode the ampersands, I wouldn't
expect it to be a bug to encode it everywhere.

> But if you want to really allow this you have to write something like:
>    $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|\w+)\;)/\&amp\;/go;
> Seems to work good but no warranty. Happy regexing ;)

Not sure offhand why you both check the entity format and use a rather
simple \w+ as an alternative... A sentence could end talking about
Barnes&Noble; and then it could be followed by another sentence :)

-- 
     2. That which causes joy or happiness.

Reply to:

Follow-Ups:
- Bug#181872: Patch
  - From: Frank Lichtenheld <frank@lichtenheld.de>

References:
- Bug#181872: Patch
  - From: Frank Lichtenheld <frank@lichtenheld.de>
- Bug#181872: Patch
  - From: Josip Rodin <joy@srce.hr>
- Bug#181872: Patch
  - From: Frank Lichtenheld <frank@lichtenheld.de>
- Bug#181872: Patch
  - From: Josip Rodin <joy@srce.hr>
- Bug#181872: Patch
  - From: Frank Lichtenheld <frank@lichtenheld.de>

Prev by Date: Processed: Repeatmerged button (Patch)
Next by Date: Re: Offer of help
Previous by thread: Bug#181872: Patch
Next by thread: Bug#181872: Patch
Index(es):
- Date
- Thread