[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#160284: ITP: po-debconf -- Manage translated Debconf templates files with gettext



Hi,

At Wed, 11 Sep 2002 02:58:59 -0400,
Glenn Maynard wrote:

> http://www.debian.or.jp/~kubota/unicode-symbols-unihan.html:

I am the writer of the above document.

I think this problem cannot be solved anyway.  My makeshift
solution is to use Japanese glyph set, because Chinese and 
Korean people seem to be more tolerant on glyph difference,
while Japanese people tend to stick to it.  However, this
problem is related to *displaying system*, not to the way
how to store text in files or memory.


However, there are more complex and important problems.


http://www.debian.or.jp/~kubota/unicode-symbols-map2.html

This is the round-trip conversion problem.  It EXISTS, but
it is not simple to determine which part is BUG.  I sent
mails to Unicode Consortium to solve this problem but I think
they don't have enough political power to solve this....  Yes,
huge POLITICAL POWER is needed to solve this problem, and,
of course I don't have.

If we never think about systems other than Debian or Linux, we
can avoid thinking about the problem I wrote in the document,
because the problem is related to incompatible mapping tables
between vendors.  However, if we think about mapping like:
UTF-8 ---(mapping using Windows)--> EUC-JP ---(mapping using
Linux)--> UTF-8 , we will suffer the problem.

ja.po files are usually written in EUC-JP.  It is just because
EUC-JP is the most popular encoding for Japanese Linux environment,
including ja.po writers' environments.  I think ja.po in UTF-8
is completely OK, but the writer should be careful not to use
UTF-8 characters which cannot be mapped to EUC-JP, because most
users use EUC-JP.  (Note that unmappable characters depend on
the mapping table and can be affected by the above mapping table
problem.)

http://www.debian.or.jp/~kubota/unicode-symbols-width2.html

Another problem is the character width.  You know, most of CJK
characters are doublewidth, which means one character occupies
two columns in console.  The rule is very simple -- characters
from ASCII and JIS X 0201 are singlewidth and characters from
JIS X 0208 and JIS X 0212 are doublewidth.  (EUC-JP encoding
uses these coded character sets.)  However, this simple rule
is valid only in EUC-JP.  How UTF-8-based terminals should
behave?  There are many characters (mainly symbols, and Cyrillics
and Greeks) which are classified into EastAsianAmbigious in
Unicode Standard Annex #11.  The most problematic characters
are ruler elements, I think.  However, ruler elements are not
often used in Debconf.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: