[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



Hi,

I'd like to state a few premise.

1. An ideal system would be aware of LC_CTYPE locale category and
   support almost popular encodings (ISO-8859-*, KOI8-R, KOI8-U,
   EUC-JP, CN-GB, EUC-KR, Big5, TIS-620, TCVN, and so on).

2. Today UTF-8 support on Linux is very poor and it would maybe need
   more than a few years.  (For example, Linux console's Unicode
   support [by unicode_start] is too poor to display CJK Ideogram).


> sure, go ahead (as long as you use international encoding, i.e. UTF-8)
> (of course, "good manners" say you should include transcription
> of your name in latin script - not necessarily ASCII! - as well,
> for the benefit of us kanji-challenged)

No.  Someone may want to use dselect under LANG=de_DE.ISO-8859-1,
LANG=th_TH.TIS620, or LANG=ru_RU.KOI8-R locales.  Then it will
fail to display my name.  It is just like I use dselect under
LANG=ja_JP.eucJP and it fails to display ISO-8859-1 letters.


> > ASCII is international (i.e., shared all around the world) but ISO-8859-1
> 
> ASCII is American, not international (American Standard Code ....)
> It just happened that USA acquired world supremacy and is forcing
> its standards to everyone else.

Then I call it ISO 646.  It is identical to ASCII.  It can also
be called "lower part of ISO 8859" or "7bit part of Unicode".
Anyway, all popular encodings which are used for LC_CTYPE in Debian
share ISO 646 (or ASCII) as a subset.  This is why all non-translated
messages and texts should be written in ISO 646.  Non-translated
messages have a chance to be displayed in every locales.  Thus
non-translated messages have to use common character set for all
encodings, it is ISO 646.


> But good i18n support will mean UTF-8 aware curses and slang, which in turn
> means dselect can display packages' descriptions in UTF-8 without 
> any problems, which in turn means there would be no reason to limit
> Packages file to ASCII, which in turn means we should mandate all
> the non-ascii fields to change into UTF-8, which is part of what
> we are discussing now. Whoa, what a sentence :-)

UTF-8 is mere one encoding out of many encodings which Debian will
have to support.  Yes, good i18n support will _include_ UTF-8 aware
curses and slang.  Curses and slang will have to support not only
UTF-8 but also any other popular encodings.  Having a UTF-8 support
should not mean that softwares work well only under UTF-8.  This
is against LC_CTYPE support.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: