Bug#181872: Patch

To: 181872@bugs.debian.org
Subject: Bug#181872: Patch
From: Frank Lichtenheld <frank@lichtenheld.de>
Date: Fri, 14 Mar 2003 15:16:23 +0100
Message-id: <[🔎] 20030314141623.GA762@djpig.de>
Reply-to: Frank Lichtenheld <frank@lichtenheld.de>, 181872@bugs.debian.org
In-reply-to: <[🔎] 20030313211214.GC6683@prvidomaci.srce.hr>
References: <[🔎] 20030313144254.GA732@djpig.de> <[🔎] 20030313172715.GB4218@prvidomaci.srce.hr> <[🔎] 20030313174635.GB732@djpig.de> <[🔎] 20030313185805.GH4218@prvidomaci.srce.hr> <[🔎] 20030313202728.GC732@djpig.de> <[🔎] 20030313211214.GC6683@prvidomaci.srce.hr>

On Thu, Mar 13, 2003 at 10:12:14PM +0100, Josip Rodin wrote:
> On Thu, Mar 13, 2003 at 09:27:28PM +0100, Frank Lichtenheld wrote:
> > Ok. Let's elaborate a little. Sorry if it's too long.
> 
> Oh, I understood perfectly what you said, I just meant to say that I thought
> the original code preserved URL: within the <a> tag by mistake.

Ok. But it is not within the <a> tag, it only converts the
<URL:http://...> to &lt;URL:http://...&gt; The regex that converts the
http://.. to <a href="http//...">http://...</a> is this one one line
below [$long_desc =~ s,(http://[\S~-]+?/?)((\&gt\;)?[)]?[']?[.\,]?(\s|$)),<a href=\"$1\">$1</a>$2,go;]

After all the discussions I would propose as the patch to apply (it
contains elements of both versions):

$long_desc =~ s,<((URL:)?\s*http://[^>]+)\s*>,\&lt\;$1\&gt\;,go;

In the end it's your decision.

> Well, I think in principle it's much better to just match until the first
> closing bracket since IME such things are less prone to errors. Of course,
> if someone found URLs with <> in them, that idea goes down the drain...

Ok, let's wait for a package maintainer to do this. Then we can handle
it ;)

> > But if you want to really allow this you have to write something like:
> >    $long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|\w+)\;)/\&amp\;/go;
> > Seems to work good but no warranty. Happy regexing ;)
> 
> Not sure offhand why you both check the entity format and use a rather
> simple \w+ as an alternative... A sentence could end talking about
> Barnes&Noble; and then it could be followed by another sentence :)

Hmmm, see the problem. Only solution seems to be to make a list of
allowed entities:
$long_desc =~ s/\&(?!(?:#x?[\da-fA-F]+|amp|gt|lt|quot)\;)/\&amp\;/go;

Greetings,
	Frank

-- 
*** Frank Lichtenheld <frank@lichtenheld.de> ***
          *** http://www.djpig.de/ ***
see also: - http://www.usta.de/
          - http://fachschaft.physik.uni-karlsruhe.de/

Reply to:

Follow-Ups:
- Bug#181872: Patch
  - From: Gerfried Fuchs <alfie@ist.org>

References:
- Bug#181872: Patch
  - From: Frank Lichtenheld <frank@lichtenheld.de>
- Bug#181872: Patch
  - From: Josip Rodin <joy@srce.hr>
- Bug#181872: Patch
  - From: Frank Lichtenheld <frank@lichtenheld.de>
- Bug#181872: Patch
  - From: Josip Rodin <joy@srce.hr>
- Bug#181872: Patch
  - From: Frank Lichtenheld <frank@lichtenheld.de>
- Bug#181872: Patch
  - From: Josip Rodin <joy@srce.hr>

Prev by Date: DINHEIRO COM INTERNET
Next by Date: Web Site??
Previous by thread: Bug#181872: Patch
Next by thread: Bug#181872: Patch
Index(es):
- Date
- Thread