Re: support for multilingual Packages files?

To: debian-devel@lists.debian.org
Subject: Re: support for multilingual Packages files?
From: Tomohiro KUBOTA <tkubota@riken.go.jp>
Date: Mon, 16 Jul 2001 22:48:06 +0900
Message-id: <[🔎] 878zhprnmx.wl@surfchem0.riken.go.jp>
In-reply-to: In your message of "Sun, 15 Jul 2001 20:22:21 +0200" <[🔎] 20010715202221.A4856@melkor.dnp.fmph.uniba.sk>
References: <[🔎] 87elrkfnrz.wl@surfchem0.riken.go.jp> <[🔎] 87g0c0forz.wl@surfchem0.riken.go.jp> <[🔎] 20010715202221.A4856@melkor.dnp.fmph.uniba.sk>

Hi,

At Sun, 15 Jul 2001 20:22:21 +0200,
Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk> wrote:

>> Thus, even in future when UTF-8 support will be fully implemented, we
>> should use ASCII for default messages.
> 
> This is the main point where we disagree.
> I am glad we finally pinpointed this out.

I see.  Let's discuss on this point.

Well, for maintainer's name, we agreed that both ASCII and UTF-8 versions
should be supplied.  Strictly speaking, choice of them can be done
based on LC_CTYPE locale.  In UTF-8 locales, it can use UTF-8 version.
I mean "C" locale by "default".  "C" locale is ASCII.  

For Description fields, using ASCII for default Description: field
is a mandatory for supporting various locales because it can be used
for all locales when translation is not supplied.  Your way will limit
dselect (and other Packages: -related softwares) to run only under
UTF-8 locales.  On the other hand, present version of dselect run on
any locales (except for some violaters who use non-ASCII characters
for their Maintainer: and Description: fields).  This means dselect
degrades.

Do you want to abolish all locales other than UTF-8?  Though I 
think it might be impossible, I can just say that at first we
have to add UTF-8 as an additional locale which Debian supports.
When we complete adding UTF-8 support and some years of experience
proved that UTF-8 support is mature enough, then we can discuss
whether we abolish non-UTF-8 locales or not.

> He did include proper Content-Type, and used quoted-printable
> encoding.
> So, his message was in plain ASCII after all :-)

Sorry, I confirmed it.  The mail contained "charset=ISO-8859-1".

> And, it is kind of difficult to discuss proper usage of
> german umlauted letters and not writing them....
> That's why I am in favour of implemnting full unicode support - 
> people would be able to exchange such mails like this without
> problems (wouldn't you like to?)

Well, though it is important for mail clients developers to support
UTF-8 mail, we still cannot assume that people all over the world
use UTF-8-enabled terminals and mail clients.  For example, I
usually use SSH client (terminal) for Windows in EUC-JP mode.
(The SSH client has three modes of EUC-JP, Shift_JIS, and ISO-2022-JP.)

Even I agree that almost people in the world will come to use
UTF-8-enabled terminals and mail clients in ten years, please
don't use ISO-8859-1 characters NOW.

> Carefully here.
> Most languages do not have a systematic way to write names in ASCII.
> Slovak (and Hungarian) certainly does not.
> The most "semi-official" way of transcribing Russian names
> (used by USA Congress library) uses diacritics over latin letters(!)
> (and no, you cannot just strip them down - it changes the
> pronunciation completely)
>
> In a way, you Japanese are lucky :-)

I know many Russian names written in ASCII characters.  I also have
examples of ASCII transliteration of Russian sentences.  Please read
support.ru.pl file in language-env source package.

When I was a student, I had a Slovak member whose name uses a non-
ASCII character like "c" with "v"-like mark above.  I issued a mail
account for him and he used "c" for the character.

Anyway, using non-ASCII character causes the names cannot be read in
some locales.

> It is comparable to the situation when you would be forced to change
> some characters from your name written in hiragana, so that it has
> mostly similar, but not the same pronunciation, and fits into some
> subset of proper hiragana, just because the computer system you are
> using is limited to that subset.

Some of hiragana and katakana characters can have "voiced mark"
and "semi-voiced mark".  It is wrong to write hiragana which lack
voiced mark.  I imagine this is similar with Latin alphabets with
additional marks.  When we use 8bit computers which cannot use
precompiled katakana with voiced mark, we wrote katakana and following
voiced mark.

If someone can really not stand his/her "wrong" name, similar way
might be used.  (However, I think many people in the world have
experience to be forced to use ASCII charset for international
communication purpose.)

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/

Reply to:

Follow-Ups:
- Re: support for multilingual Packages files?
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
- Re: support for multilingual Packages files?
  - From: Michael Bramer <grisu@debian.org>
- Re: support for multilingual Packages files?
  - From: John Galt <galt@inconnu.isu.edu>

References:
- Re: support for multilingual Packages files?
  - From: Tomohiro KUBOTA <tkubota@riken.go.jp>
- Re: support for multilingual Packages files?
  - From: Tomohiro KUBOTA <tkubota@riken.go.jp>
- Re: support for multilingual Packages files?
  - From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>

Prev by Date: Re: ITP: tsocks -- transparent network access through a SOCKS 4 or 5 proxy
Next by Date: Re: cramfs problems, also file(1)
Previous by thread: Re: support for multilingual Packages files?
Next by thread: Re: support for multilingual Packages files?
Index(es):
- Date
- Thread