Re: support for multilingual Packages files?

To: debian-devel@lists.debian.org
Subject: Re: support for multilingual Packages files?
From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
Date: Mon, 16 Jul 2001 20:03:40 +0200
Message-id: <[🔎] 20010716200340.A25929@melkor.dnp.fmph.uniba.sk>
In-reply-to: <[🔎] 878zhprnmx.wl@surfchem0.riken.go.jp>

On Mon, Jul 16, 2001 at 10:48:06PM +0900, Tomohiro KUBOTA wrote:
> Hi,
> 
> At Sun, 15 Jul 2001 20:22:21 +0200,
> Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk> wrote:
> 
> >> Thus, even in future when UTF-8 support will be fully implemented, we
> >> should use ASCII for default messages.
> > 
> > This is the main point where we disagree.
> > I am glad we finally pinpointed this out.
> 
> I see.  Let's discuss on this point.
> 
> Well, for maintainer's name, we agreed that both ASCII and UTF-8 versions
> should be supplied.  Strictly speaking, choice of them can be done

well, not really... my intention was to provide both original
name (for non-latin scripts), and transacription into latin script
(not necessary ASCII), because, as you said, people all over the world
are expected to read latin script, but not any other.
For latin-script names, well, they can stay the same.

> based on LC_CTYPE locale.  In UTF-8 locales, it can use UTF-8 version.
> I mean "C" locale by "default".  "C" locale is ASCII.  
> 
> For Description fields, using ASCII for default Description: field
> is a mandatory for supporting various locales because it can be used
> for all locales when translation is not supplied.  Your way will limit
> dselect (and other Packages: -related softwares) to run only under
> UTF-8 locales. 

How? Dselect will just fail to display the out-of-ascii characters,
but otherwise should work. 

One question: when you encounter 8-bit chars (let's say latin1) in your
EUC-JP terminal, do they display as question marks, as some random
garbage, or does it render the whole terminal unusable?

> On the other hand, present version of dselect run on
> any locales (except for some violaters who use non-ASCII characters
> for their Maintainer: and Description: fields).  This means dselect

It runs even then, just those characters do not display.
And, if one guesses the correct encoding and selects appropriate font,
they do display... now, if UTF-8 were mandated, there would not be a problem
with guessing the encoding.

> degrades.
> 

No change for dselect. Only a bit of improvement over current situation.

> Do you want to abolish all locales other than UTF-8?  Though I 

Not now, it is impossible in current situation. But for example,
I would like to use UTF-8 locale instead of my ISO-8859-2 which
I am quasi-forced to use now and which gets into my way too often.

> think it might be impossible, I can just say that at first we
> have to add UTF-8 as an additional locale which Debian supports.
> When we complete adding UTF-8 support and some years of experience
> proved that UTF-8 support is mature enough, then we can discuss
> whether we abolish non-UTF-8 locales or not.

I agree.

> 
> 
> > He did include proper Content-Type, and used quoted-printable
> > encoding.
> > So, his message was in plain ASCII after all :-)
> 
> Sorry, I confirmed it.  The mail contained "charset=ISO-8859-1".
> 

yes, but in quoted-printable encoding, which is 7-bit encoding
So, the mail arrived as 7-bit ASCII (I know, it makes no
difference to your ability to read it, sorry for the little joke)

> 
> > And, it is kind of difficult to discuss proper usage of
> > german umlauted letters and not writing them....
> > That's why I am in favour of implemnting full unicode support - 
> > people would be able to exchange such mails like this without
> > problems (wouldn't you like to?)
> 
> Well, though it is important for mail clients developers to support
> UTF-8 mail, we still cannot assume that people all over the world
> use UTF-8-enabled terminals and mail clients.  For example, I

It is is RFC, so we can expect better and better support.

> usually use SSH client (terminal) for Windows in EUC-JP mode.
> (The SSH client has three modes of EUC-JP, Shift_JIS, and ISO-2022-JP.)
> 
> Even I agree that almost people in the world will come to use
> UTF-8-enabled terminals and mail clients in ten years, please
> don't use ISO-8859-1 characters NOW.
> 
> 
> > Carefully here.
> > Most languages do not have a systematic way to write names in ASCII.
> > Slovak (and Hungarian) certainly does not.
> > The most "semi-official" way of transcribing Russian names
> > (used by USA Congress library) uses diacritics over latin letters(!)
> > (and no, you cannot just strip them down - it changes the
> > pronunciation completely)
> >
> > In a way, you Japanese are lucky :-)
> 
> I know many Russian names written in ASCII characters.  I also have

but in no consistent way. And if I see a Russian name in ASCII,
often I have troubles guessing out the original form.

> examples of ASCII transliteration of Russian sentences.  Please read
> support.ru.pl file in language-env source package.
> 

I've seen (and wrote) enough of Russian in ASCII to want to have
other options...

> When I was a student, I had a Slovak member whose name uses a non-
> ASCII character like "c" with "v"-like mark above.  I issued a mail
> account for him and he used "c" for the character.
> 

Because he had no other option. Leaving out the caron over "c" changes
pronunciation a lot (like the difference between English "ch" in "church"
and "ts" in "tsar", and what is worse, one cannot reconstruct the name back)
My name has acute accent over i. I have to leave it out, for the sake of 
compatibility (fortunately, this is minor)

> Anyway, using non-ASCII character causes the names cannot be read in
> some locales.
> 

and using ascii only characters causes the names to be wrong in _ALL_
locales. Tough choice.

> 
> > It is comparable to the situation when you would be forced to change
> > some characters from your name written in hiragana, so that it has
> > mostly similar, but not the same pronunciation, and fits into some
> > subset of proper hiragana, just because the computer system you are
> > using is limited to that subset.
> 
> Some of hiragana and katakana characters can have "voiced mark"
> and "semi-voiced mark".  It is wrong to write hiragana which lack
> voiced mark.  I imagine this is similar with Latin alphabets with

you mean nigori and maru? Yes, the situation with accent marks in
latin alpabets is equivalent. 

> additional marks.  When we use 8bit computers which cannot use
> precompiled katakana with voiced mark, we wrote katakana and following
> voiced mark.

you were lucky.. you could write the voiced mark. You cannot
do that with Slovak and carons.

And do you write your name this way in e-mails and similar, because this
ensures compatibility with those 8bit computers?

> 
> If someone can really not stand his/her "wrong" name, similar way
> might be used.  (However, I think many people in the world have
> experience to be forced to use ASCII charset for international
> communication purpose.)
> 

Yes, unfortunately....

ja ne

-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Reply to:

Follow-Ups:
- Re: support for multilingual Packages files?
  - From: "David Starner" <dstarner98@aasaa.ofe.org>

References:
- Re: support for multilingual Packages files?
  - From: Tomohiro KUBOTA <tkubota@riken.go.jp>

Prev by Date: Re: cramfs problems, also file(1)
Next by Date: Re: support for multilingual Packages files?
Previous by thread: Re: support for multilingual Packages files?
Next by thread: Re: support for multilingual Packages files?
Index(es):
- Date
- Thread