[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#186740: Encode HTML special chars in $short_desc



Frank Lichtenheld said:
> Attached a patch that would encode HTML special chars in the short
> description (see #181872 for a similar discussion on long
> descriptions) both in all_packages and in the packages pages.
> 
> The lines handling & are commented out. See the corresponding
> discussion in the bug mentioned above.
> 
> Greetings,
> 	Frank

I would suggest (as I have irritatingly done so elsewhere) that rather
than using fixed regexes, you use the SGML::ISO8859::str2sgml() or
HTML::Entities::encode_entities() functions on $short_desc and $long_desc.

Also IMO better to fix only '&' when it doesn't look like an entity;
your proposed fix in 181872 will match all the &#...; entities but miss
nearly all of the named ones!  You can easily import the entity hash
%char2entity from HTML::Entities and do the fixing yourself, or just do
a convoluted bit of passing the string back and forth between the
encode() and decode() functions to get it consistent.

For example:

[harp:~]$ perl -e 'use HTML::Entities; $foo = "&foo blah &amp; <url>\n"; print $foo, encode_entities($foo), encode_entities(decode_entities($foo)), decode_entities(encode_entities(decode_entities($foo)));'

... will yield the following:

&foo blah &amp; <url>			[original string, yuck]
&amp;foo blah &amp;amp; &lt;url&gt;	[no good!]
&amp;foo blah &amp; &lt;url&gt;		[ahah, better, original &amp is preserved]
&foo blah & <url>			[gives us this when decoded, good!]


This way if a package description has some encoded entities in it
already (eg &amp; in a URL) as well as unencoded things (eg '<'), you
would first run it through an SGML entity decoder, and then run the
output through an SGML entity encoder.

e.g.

  use HTML::Entities ();
  my $short_desc = $package{$_}{'short-desc'};
  $short_desc = HTML::Entities::decode_entities($short_desc);
  $short_desc = HTML::Entities::encode_entities($short_desc);

or:

  use HTML::Entities ();	# up the top of the script somewhere
  ...
  $all_package .= "\n     <dd>" . \
  	HTML::Entities::decode_entities( \
	HTML::Entities::encode_entities( \
	$package{$_}{'short-desc'} ) ) . "\n";

Then again for $long_desc ...

Think on it anyway, it seems good to me but maybe you have some other
thoughts.


> Index: htmlscripts/pages.pl
> ===================================================================
> RCS file: /cvs/webwml/packages/htmlscripts/pages.pl,v
> retrieving revision 1.10
> diff -u -IMD5 -r1.10 pages.pl
> --- htmlscripts/pages.pl	24 Mar 2003 15:05:57 -0000	1.10
> +++ htmlscripts/pages.pl	29 Mar 2003 15:23:42 -0000
> @@ -113,7 +113,11 @@
>  		if ($distrib =~ /(contrib|non-free|non-us|security)/o) {
>  			$all_package .= " [<font color=\"red\">$distrib</font>]\n";
>  		}
> -		$all_package .= "\n	<dd>".$package{$_}{'short-desc'}."\n";
> +		my $short_desc = $package{$_}{'short-desc'};
> +#		$short_desc =~ s/&/\&amp\;/go;
> +		$short_desc =~ s/</\&lt\;/go;
> +		$short_desc =~ s/>/\&gt\;/go;
> +		$all_package .= "\n	<dd>".$short_desc."\n";
>  	}
>  	$all_package .= "</dl>\n";
>  	$all_package .= trailer('../..');
> @@ -161,6 +165,9 @@
>  		}
>  		$short_desc = $package{$pack}{'short-desc'};
>  		$long_desc = $package{$pack}{'long-desc'};
> +#		$short_desc =~ s/\&/\&amp\;/go;
> +		$short_desc =~ s/</\&lt\;/go;
> +		$short_desc =~ s/>/\&gt\;/go;
>  		$long_desc =~ s,<((URL:)?http://[\S~-]+?/?)>,\&lt\;$1\&gt\;,go;
>  		$long_desc =~ s,(http://[\S~-]+?/?)((\&gt\;)?[)]?[']?[.\,]?(\s|$)),<a href=\"$1\">$1</a>$2,go;
>  		$long_desc =~ s/\A //o;
> 

Andrew.

-- 
Andrew Shugg <andrew@neep.com.au>                   http://www.neep.com.au/

"Just remember, Mr Fawlty, there's always someone worse off than yourself."
"Is there?  Well I'd like to meet him.  I could do with a good laugh."



Reply to: