[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



Hi,

At Fri, 6 Jul 2001 16:11:59 +0200,
Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk> wrote:

> But xterm is better and getting better all the time.

Agreed.  Xterm is the best Unicode software as far as I know.
It supports doublewidth characters and combining characters.
There is a patch to enable bidi (Arab/Hebrew) and high planes
(above U+10000).  However, we are now discussing about Packages
file, which is used on installation process.  Thus, we definitely
need UTF-8 support by Linux console if we want to use UTF-8 for
Packages file.

> > No.  Someone may want to use dselect under LANG=de_DE.ISO-8859-1,
> > LANG=th_TH.TIS620, or LANG=ru_RU.KOI8-R locales.  Then it will
> > fail to display my name.  It is just like I use dselect under
> > LANG=ja_JP.eucJP and it fails to display ISO-8859-1 letters.
> 
> This is glibc deficiency. Decent i18n would provide suitable
> transliteration (try this command:
> filterm - isolatin1-ascii
> and display some latin1 texts - if something similar
> would be in glibc, half of the problem would be gone)

I cannot think about algorithmical transliteration from
CJK Ideogram to Latin Alphabets.  One Ideogram often has
several readings (at least in case of Japanese).

Thus, we need ASCII messages written by human, not by
automatic transliteration by computer.


> Ideally, you would use LANG=ja_JP.UTF8 and dselect would
> display ISO-8859-1, ISO-8859-2, Kanji and whatever there is.
> 
> I would use LANG=sk_SK.UTF8 and dslect will still display
> everything.
> 
> someone else might use LANG=ru_RU.UTF8 and dselect still works.
> 
> I see this as a worthy goal.

Like it, LANG=ja_JP.eucJP and dselect will display Latin/Greek/Cyrillic
alphabets and a part of CJK Ideogram but will fail to display
Thai/Hebrew/Arab letters (or transliteration may be available).
LANG=sk_SK.ISO-8859-2 and dselect will display Latin alphabets
and transliteration of Greek/Cyrillic alphabets and fail to
display CJK Ideogram.

I admit that it is nice.  However, before we dream of nice system,
we should think about error-free system.

I think that two fields of Maintainer: and Maintainer-utf8:
in Packages file can be a solution.



> Sure, but translated from _what_ ?
> 
> We have one original, and translations. Now by translations I do not
> mean only language translations, but also charset changes. One of the
> translations can be english.ASCII (or call it whatever you want).

Right.  The requirement for the original message is that it must be
easily read by people all over the world.  Thus, English language is
the most proper candidate (for some historical/political reasons.  Not
Esperanto).  And more, I think usage of non-Latin letters (like
Cyrillic, Greek, CJK Ideogram, Thai, Hebrew, Arab, and so on) should
be avoided unless very strong reason to use them.  This is because
Latin Alphabet is the only scripts which we can expect people in the
world to be able to manage to read it.  Though writing developers'
names in their native letters sounds fantastic, it implies a risk
that it cannot be read by people in the world.


> Original database should contain all the information (why constrain
> yourself?), and messages translated to ASCII user's locale will be
> transliterated to ascii - preferrably on the fly, via improved iconv().

Original database should be written in English.  Ok, someone might want
to use non-Latin alphabets in very limited cases.  In such a case,
the writer of the database will have to choose one of two risks:
(1) use non-Latin alphabets but people in the world may not be able
to read them or algorithmic transcription may fail, or (2) give up
to use non-Latin alphabets.

I imagine every writers want their messages to be read by as possible
as many people in the world.  Thus, I write my signature in Latin
alphabet.  Debian developers who use ISO-8859-1 letters in their
Maintainer: field and who don't discuss i18n support are, I imagine,
simply ignorant about this problem.  I don't think they have a certain
will to use non-ASCII letters and to improve i18n around Debian.


> By your reasoning, you should demand that Packages should be in
> UPPERCASE, because there are still terminals and encodings in use that
> do not support lowercase letters, and UPPERCASE is common to greater
> set of encodings than complete ASCII.

The key point is that all popular encodings are upward compatible to
ASCII.  Debian doesn't support UPPERCASE-only encodings so far and I
think nobody want to do it.


---
久保田智広 Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: