[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Fri, Jul 06, 2001 at 10:44:41PM +0900, Tomohiro KUBOTA wrote:
> Hi,
> 
> I'd like to state a few premise.
> 
> 1. An ideal system would be aware of LC_CTYPE locale category and
>    support almost popular encodings (ISO-8859-*, KOI8-R, KOI8-U,
>    EUC-JP, CN-GB, EUC-KR, Big5, TIS-620, TCVN, and so on).

Agreed.

> 
> 2. Today UTF-8 support on Linux is very poor and it would maybe need
>    more than a few years.  (For example, Linux console's Unicode
>    support [by unicode_start] is too poor to display CJK Ideogram).

Linux console is mostly limited by VGA hardware, that cannot display
more than 512 different glyphs at once. (and framebuffer is just
a software VGA character buffer, nothing else).
But xterm is better and getting better all the time.

Framebuffer should be rewritten, I agree.

> 
> 
> > sure, go ahead (as long as you use international encoding, i.e. UTF-8)
> > (of course, "good manners" say you should include transcription
> > of your name in latin script - not necessarily ASCII! - as well,
> > for the benefit of us kanji-challenged)
> 
> No.  Someone may want to use dselect under LANG=de_DE.ISO-8859-1,
> LANG=th_TH.TIS620, or LANG=ru_RU.KOI8-R locales.  Then it will
> fail to display my name.  It is just like I use dselect under
> LANG=ja_JP.eucJP and it fails to display ISO-8859-1 letters.

This is glibc deficiency. Decent i18n would provide suitable
transliteration (try this command:
filterm - isolatin1-ascii
and display some latin1 texts - if something similar
would be in glibc, half of the problem would be gone)

Ideally, you would use LANG=ja_JP.UTF8 and dselect would
display ISO-8859-1, ISO-8859-2, Kanji and whatever there is.

I would use LANG=sk_SK.UTF8 and dslect will still display
everything.

someone else might use LANG=ru_RU.UTF8 and dselect still works.

I see this as a worthy goal.

> 
> 
> > > ASCII is international (i.e., shared all around the world) but ISO-8859-1
> > 
> > ASCII is American, not international (American Standard Code ....)
> > It just happened that USA acquired world supremacy and is forcing
> > its standards to everyone else.
> 
> Then I call it ISO 646.  It is identical to ASCII.  It can also
> be called "lower part of ISO 8859" or "7bit part of Unicode".
> Anyway, all popular encodings which are used for LC_CTYPE in Debian
> share ISO 646 (or ASCII) as a subset.  This is why all non-translated
> messages and texts should be written in ISO 646.  Non-translated
> messages have a chance to be displayed in every locales.  Thus
> non-translated messages have to use common character set for all
> encodings, it is ISO 646.

Sure, but translated from _what_ ?

We have one original, and translations. Now by translations I do not
mean only language translations, but also charset changes. One of the
translations can be english.ASCII (or call it whatever you want).

Original database should contain all the information (why constrain
yourself?), and messages translated to ASCII user's locale will be
transliterated to ascii - preferrably on the fly, via improved iconv().

(of course, having _proper_ _working_ _complete_ unicode
support for console, X and application would make this a non-issue)

Just because today the lowest common denominator is ASCII,
we should not be crippling ourselves and force everything to
fit there, when a solution is emerging.

By your reasoning, you should demand that Packages should be in
UPPERCASE, because there are still terminals and encodings in use that
do not support lowercase letters, and UPPERCASE is common to greater
set of encodings than complete ASCII.


We should not be forcing ourselves to use ascii all the time, but rather
look for better solutions (if Japanese had not done this, there would be
no eucJP (and similar) encodings and all of them would be using romanji by
today)

> 
> 
> > But good i18n support will mean UTF-8 aware curses and slang, which in turn
> > means dselect can display packages' descriptions in UTF-8 without 
> > any problems, which in turn means there would be no reason to limit
> > Packages file to ASCII, which in turn means we should mandate all
> > the non-ascii fields to change into UTF-8, which is part of what
> > we are discussing now. Whoa, what a sentence :-)
> 
> UTF-8 is mere one encoding out of many encodings which Debian will
> have to support.  Yes, good i18n support will _include_ UTF-8 aware
> curses and slang.  Curses and slang will have to support not only
> UTF-8 but also any other popular encodings.  Having a UTF-8 support
> should not mean that softwares work well only under UTF-8.  This
> is against LC_CTYPE support.

Agreed. 

-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: