[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Accepted po-debconf 0.2.2 (all source)



Hi,

At Mon, 16 Sep 2002 18:08:26 +0900,
Tomohiro KUBOTA wrote:

> My opinion on Unicode and Debconf: there are no problem on usage of
> Unicode for storing data or internal processing of Debconf for
> Japanese, as same as other languages.

Addition:

This does not immediately mean that I insist that UTF-8 should be used
for storing Debconf translations.  I just said usage of UTF-8 does not
bring any problems.

Both opinions have merits and demerits.  Merits of both sides are:

Currently, most users use their local encodings such as ISO-8859-15
or EUC-JP.  Thus, if Debconf tranlations are stored in these popular
encodings for each language (current situation), we can omit conversion
process for these cases, results in better performance.  In (fewer)
cases of uses who use different encodings (UTF-8 for Japanese,
ISO-8859-5 for Russian, ...), conversion is needed.

On the other hand, if UTF-8 is used for storing Debconf translations,
internal implementation of Debconf could be simpler.  All we have to
do will be compile-time conversions (from translator's encodings to
UTF-8) and run-time conversions (from UTF-8 to users' locale encodings).


Additional comments:

I think UTF-8 would be more and more popular in future (in order of
ten years).  Thus, "no conversion needed for most users" merit of
current situation will be smaller.

How to determine width of characters.  In C language and GNU libc,
wcwidth() works very well, including characters whose width changes
depending on locale (EastAsianAmbigious and exceptions).  I don't
know how to invoke this function from Perl.  If this cannot be 
handled correctly, east Asian people will hate UTF-8-based Debconf.

Usage of iconv in Perl.  "libtext-iconv-perl" package Depends: on
"perl" package, which is very large.  Though I don't know translated
Debconf templates are used in initial installation or not, if yes,
this will increase the size of installation floppy.

Perl-5.8's new "Encode" module could be used for conversion.  However,
Perl-5.8 seems to have its own mapping tables.  Though I have not,
we have to check whether the tables are same to glibc's tables.
Otherwise, round-trip conversion problem can occur.

Translated template files in /var/lib/dpkg/info/*.templates can be
regarded as binary files, so I don't care co-existance of various
encodings in one file.  However, it is *evil* for template files in
source packages have various encodings in one file.  Usage of UTF-8
could solve this problem, but this enfoces translators to use UTF-8,
which is not a good idea because there are not many text editors which
can edit UTF-8 including all needed character ranges.  The best and
simple way is force package maintainers to have separate template
files for each language.


Conclusion:

UTF-8-based storing is a bit better, provided that the above problems
(usage of wcwidth() and requirement of large "perl" package) are solved.

Translated templates in source packages should be separated for each
language.  Translators should be able to choose their preferable
encoding, at least from two of UTF-8 or "popular encoding" for that
language.  ("Popular encoding" should be documented.)

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: