Bug#186740: Encode HTML special chars in $short_desc
Frank Lichtenheld said:
> Attached a patch that would encode HTML special chars in the short
> description (see #181872 for a similar discussion on long
> descriptions) both in all_packages and in the packages pages.
>
> The lines handling & are commented out. See the corresponding
> discussion in the bug mentioned above.
>
> Greetings,
> Frank
I would suggest (as I have irritatingly done so elsewhere) that rather
than using fixed regexes, you use the SGML::ISO8859::str2sgml() or
HTML::Entities::encode_entities() functions on $short_desc and $long_desc.
Also IMO better to fix only '&' when it doesn't look like an entity;
your proposed fix in 181872 will match all the &#...; entities but miss
nearly all of the named ones! You can easily import the entity hash
%char2entity from HTML::Entities and do the fixing yourself, or just do
a convoluted bit of passing the string back and forth between the
encode() and decode() functions to get it consistent.
For example:
[harp:~]$ perl -e 'use HTML::Entities; $foo = "&foo blah & <url>\n"; print $foo, encode_entities($foo), encode_entities(decode_entities($foo)), decode_entities(encode_entities(decode_entities($foo)));'
... will yield the following:
&foo blah & <url> [original string, yuck]
&foo blah &amp; <url> [no good!]
&foo blah & <url> [ahah, better, original & is preserved]
&foo blah & <url> [gives us this when decoded, good!]
This way if a package description has some encoded entities in it
already (eg & in a URL) as well as unencoded things (eg '<'), you
would first run it through an SGML entity decoder, and then run the
output through an SGML entity encoder.
e.g.
use HTML::Entities ();
my $short_desc = $package{$_}{'short-desc'};
$short_desc = HTML::Entities::decode_entities($short_desc);
$short_desc = HTML::Entities::encode_entities($short_desc);
or:
use HTML::Entities (); # up the top of the script somewhere
...
$all_package .= "\n <dd>" . \
HTML::Entities::decode_entities( \
HTML::Entities::encode_entities( \
$package{$_}{'short-desc'} ) ) . "\n";
Then again for $long_desc ...
Think on it anyway, it seems good to me but maybe you have some other
thoughts.
> Index: htmlscripts/pages.pl
> ===================================================================
> RCS file: /cvs/webwml/packages/htmlscripts/pages.pl,v
> retrieving revision 1.10
> diff -u -IMD5 -r1.10 pages.pl
> --- htmlscripts/pages.pl 24 Mar 2003 15:05:57 -0000 1.10
> +++ htmlscripts/pages.pl 29 Mar 2003 15:23:42 -0000
> @@ -113,7 +113,11 @@
> if ($distrib =~ /(contrib|non-free|non-us|security)/o) {
> $all_package .= " [<font color=\"red\">$distrib</font>]\n";
> }
> - $all_package .= "\n <dd>".$package{$_}{'short-desc'}."\n";
> + my $short_desc = $package{$_}{'short-desc'};
> +# $short_desc =~ s/&/\&\;/go;
> + $short_desc =~ s/</\<\;/go;
> + $short_desc =~ s/>/\>\;/go;
> + $all_package .= "\n <dd>".$short_desc."\n";
> }
> $all_package .= "</dl>\n";
> $all_package .= trailer('../..');
> @@ -161,6 +165,9 @@
> }
> $short_desc = $package{$pack}{'short-desc'};
> $long_desc = $package{$pack}{'long-desc'};
> +# $short_desc =~ s/\&/\&\;/go;
> + $short_desc =~ s/</\<\;/go;
> + $short_desc =~ s/>/\>\;/go;
> $long_desc =~ s,<((URL:)?http://[\S~-]+?/?)>,\<\;$1\>\;,go;
> $long_desc =~ s,(http://[\S~-]+?/?)((\>\;)?[)]?[']?[.\,]?(\s|$)),<a href=\"$1\">$1</a>$2,go;
> $long_desc =~ s/\A //o;
>
Andrew.
--
Andrew Shugg <andrew@neep.com.au> http://www.neep.com.au/
"Just remember, Mr Fawlty, there's always someone worse off than yourself."
"Is there? Well I'd like to meet him. I could do with a good laugh."
Reply to: