Package description page is not compliant to multibyte characters
Hi,
I heard a report that Japanese translation of package description
pages (by Debian Description Translation Project) is broken.
For example,
http://ddtp.debian.org/packages.debian.org/stable/admin/apmd.ja.html
(It might be difficult to understand the page is broken if you
cannot read Japanese.)
Analysis:
This page seems to be generated by a script
gluck.debian.org:/org/packages.debian.org/htmlscripts/pages.pl
and the first character of the long description of a package
is written in larger font:
$long_desc =~ /^([^&]|&[^;]+;)/;
$first = $1;
$rest = substr($long_desc,length($first));
$package_page .= "<p style=\"text-align: justify\"><font size=\"+2\">$first</font>$rest\n";
However, in multibyte encodings such as EUC-JP (Japanese),
a character may be consist of multiple bytes. On the other
hand the expression [^&] matches one *byte* rather than
one *character*. Thus, when the first character of the
long description is a multibyte character, $first will be
the first byte of the multibyte character, not entire the
multibyte character.
Solution:
Right way is to make the script multibyte-compliant. It may
be difficult to support arbitrary encodings. However, it may
be easy to support a limited range of multibyte encodings
which are possible candidates for Debian web pages (such as
"EUC-JP, EUC-KR, GB2312, Big5, Big5HKSCS, and UTF-8").
I heard that there is an another solution like following:
<style>
<!--
p.description {text-align: justify;}
p.description:first-letter {font: 150%;}
-->
</style>
and
<p class="description">This is a long package discription.</p>
Though this solution is environment-dependent, at least this
way never make the content unreadable.
However, an another solution is to give up the decorating
by using larger font for the first character. I think this
might be a good solution because "using larger font for
the first character" cannot be truely universal. Imagine
Arabic characters. How can the first character of an Arabic
word be a larger? Though we don't have Arabic translation
yet, we may have in future.
Thus, my suggestion is to give up the decoration. However,
I will appreciate any other solutions which will stop breaking
the contents.
---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
Reply to: