[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Mon, 16 Jul 2001, Tomohiro KUBOTA wrote:

>Hi,
>
>At Sun, 15 Jul 2001 20:22:21 +0200,
>Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk> wrote:
>
>>> Thus, even in future when UTF-8 support will be fully implemented, we
>>> should use ASCII for default messages.
>>
>> This is the main point where we disagree.
>> I am glad we finally pinpointed this out.
>
>I see.  Let's discuss on this point.
>
>Well, for maintainer's name, we agreed that both ASCII and UTF-8 versions
>should be supplied.  Strictly speaking, choice of them can be done
>based on LC_CTYPE locale.  In UTF-8 locales, it can use UTF-8 version.
>I mean "C" locale by "default".  "C" locale is ASCII.

Call the ASCII the RFC-822 name, just for the hell of it.

>From RFC 822:

     Each header field can be viewed as a single, logical  line  of
        ASCII  characters,  comprising  a field-name and a field-body.

ASCII and RFC-822 are equivalent, it's just that calling the ASCII name
the RFC-822 name reminds people of the reasoning behind it.

>For Description fields, using ASCII for default Description: field
>is a mandatory for supporting various locales because it can be used
>for all locales when translation is not supplied.  Your way will limit
>dselect (and other Packages: -related softwares) to run only under
>UTF-8 locales.  On the other hand, present version of dselect run on
>any locales (except for some violaters who use non-ASCII characters
>for their Maintainer: and Description: fields).  This means dselect
>degrades.
>
>Do you want to abolish all locales other than UTF-8?  Though I
>think it might be impossible, I can just say that at first we
>have to add UTF-8 as an additional locale which Debian supports.
>When we complete adding UTF-8 support and some years of experience
>proved that UTF-8 support is mature enough, then we can discuss
>whether we abolish non-UTF-8 locales or not.

Perhaps "preferred charset maintainer name" and "preferred charset"
fields, so that the PCMN can be any arbitrary charset, so long as it can
be defined in the PC field?  Also, this could clue in various parts of the
installer that charset translation inside the package may be required for
optimal usability.

>
>> He did include proper Content-Type, and used quoted-printable
>> encoding.
>> So, his message was in plain ASCII after all :-)
>
>Sorry, I confirmed it.  The mail contained "charset=ISO-8859-1".
>
>
>> And, it is kind of difficult to discuss proper usage of
>> german umlauted letters and not writing them....
>> That's why I am in favour of implemnting full unicode support -
>> people would be able to exchange such mails like this without
>> problems (wouldn't you like to?)
>
>Well, though it is important for mail clients developers to support
>UTF-8 mail, we still cannot assume that people all over the world
>use UTF-8-enabled terminals and mail clients.  For example, I
>usually use SSH client (terminal) for Windows in EUC-JP mode.
>(The SSH client has three modes of EUC-JP, Shift_JIS, and ISO-2022-JP.)
>
>Even I agree that almost people in the world will come to use
>UTF-8-enabled terminals and mail clients in ten years, please
>don't use ISO-8859-1 characters NOW.
>
>
>> Carefully here.
>> Most languages do not have a systematic way to write names in ASCII.
>> Slovak (and Hungarian) certainly does not.
>> The most "semi-official" way of transcribing Russian names
>> (used by USA Congress library) uses diacritics over latin letters(!)
>> (and no, you cannot just strip them down - it changes the
>> pronunciation completely)
>>
>> In a way, you Japanese are lucky :-)
>
>I know many Russian names written in ASCII characters.  I also have
>examples of ASCII transliteration of Russian sentences.  Please read
>support.ru.pl file in language-env source package.
>
>When I was a student, I had a Slovak member whose name uses a non-
>ASCII character like "c" with "v"-like mark above.  I issued a mail
>account for him and he used "c" for the character.
>
>Anyway, using non-ASCII character causes the names cannot be read in
>some locales.
>
>
>> It is comparable to the situation when you would be forced to change
>> some characters from your name written in hiragana, so that it has
>> mostly similar, but not the same pronunciation, and fits into some
>> subset of proper hiragana, just because the computer system you are
>> using is limited to that subset.
>
>Some of hiragana and katakana characters can have "voiced mark"
>and "semi-voiced mark".  It is wrong to write hiragana which lack
>voiced mark.  I imagine this is similar with Latin alphabets with
>additional marks.  When we use 8bit computers which cannot use
>precompiled katakana with voiced mark, we wrote katakana and following
>voiced mark.
>
>If someone can really not stand his/her "wrong" name, similar way
>might be used.  (However, I think many people in the world have
>experience to be forced to use ASCII charset for international
>communication purpose.)
>
>---
>Tomohiro KUBOTA <kubota@debian.org>
>http://www.debian.or.jp/~kubota/
>"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
>
>
>

-- 
I can be immature if I want to, because I'm mature enough to make my own
decisions.

Who is John Galt?  galt@inconnu.isu.edu



Reply to: