[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



Hi,

Sorry for a long mail...

At Mon, 30 Jul 2001 18:00:04 +0200,
Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk> wrote:

> but national encodings are just that, national encodings. I cannot count
> now many times I have been frustrated by this, when I needed to mix more
> languages. UTF-8 helped a lot, at least on www pages I can use it freely
> (apart from some support for old www browsers, but that is rather easy 
> achievable when I can expect who is accessing my pages)

Do you know Mule and Emacs have been able to mix many languages
for long years?  They are not based on UTF-8.  Ok, UTF-8 is one
of such international encodings.

However, the fact you need the mixture of languages does not mean that
people in the world need it.  Almost people in the world want to use
their own language.  Some others need to use other languages.
As I wrote at http://www.debian.or.jp/~kubota/unicode-symbols.html ,
Unicode has problems on using Japanese and I am very sure that Unicode
cannot supply a usable solution unless MS or IBM would go bankrupt
or would resign a membership of Unicode Consortium.

Thus, UTF-8 locales are needed for people who use language mix,
while EUC-JP locale is needed for Japanese.  I don't know very well
about other languages.


>> It is YOU who want to avoid confusion of characters with and without
>> diacritics.  Why can you say that all people with Latin-script names
>> want to use question mark than eliminating diacritics?
> Not all. That would be up to the maintainers to decide what do
> they want to do with their names. 

Good.  Thus, we need ASCII field for maintainers to write their
preferable ASCII name.


> And? Do you know how to read Slovak letter "ch" ?
> It consists of two pure-ASCII characters, no diacritics.

No, but we understand "ch" consists from two characters of "c" and "h".
Nobody confuses "c" as "o".  However, for characters with diacritics,
we really don't know them and we can confuse a character as a different
character.  We may confuse acute, grave, and macron.  We may consider
the difference of them as a difference between typeface, because we
don't know these diacritics.  


[from other mails by Rabovan]

>> Yes, it should be ASCII.  ASCII is the common denominator that's present in
> It should, but it could not. For translation effort not to look dumb,
> there is a need for _proper_ maintainers names somewhere. I am trying
> to put it into Packages. If you have other ideas, please tell.

For tentative purpose until Maintainer-utf8: field will be available,
you can use README.Debian file or so to put your correct name,
with -*- coding: foobar; -*- line at the first line.


> there are people using UPPERCASE only terminals and printers now.
> (I am one of them!). It does not mean we should not fill our archive with
> lowercase letters.

Debian does not have UPPERCASE-only locale.  In short, it does not
support it.  On the other hand, almost people in the world use
non-UTF8 locales.


>> You don't understand. The problems Tomohiro wants fixed, aren't going to be
> I do understand. I told earlier I do not feel competent to comment
> on CJK unification. I can imagine that Kapanese feel bad about it,
> and uderstand their reasoning. OTOH, I equally well understand reasoning
> of the opposite party :-)
> All other problems could be fixed.


No, you misunderstand.  As I wrote before please read my research
of Unicode at http://www.debian.or.jp/~kubota/unicode-symbols.html
and you will find problems other than Han Unification.  I was saying
about these problems (character width, orthodox conversion table,
and so on).  BTW, if you don't understand the difference between
CCS (Coded Character Set) and CES (Character Encoding Scheme)
which is an important concept to understand my research, please
refer my document http://www.debian.org/doc/manuals/intro-i18n .
I said that a part of these problems (other than Han Unification)
will never be fixed because of political dispute in Unicode Consortium.


> I did not say it is not bad. But if Tomohiro sees a random garbage kana or
> a question mark in my name, I do not think it will be the end of the world.
> And the same, if someone sees random ISO-8859-2 characters in place of his name,
> it is not the end of the world.

Did you read how garbage can be?  I said it may break the whole screen
(by scrolling).

Is it the end of the world that your name is written in ASCII
characters without diacritics?  I don't understand why you
insist this problem is more important than garbage character
problem.


>>       disadvantages:
>>       - maintainers who want to use non-ASCII characters are forced
>>         to supply two versions of descriptions (or names,...).  However,
> or they could decide if they prefer not to include the ASCII version at all,
> so that nobody is confused by incorrect variant of their name (I am talking
> now about latin-script names with diacritics)

In my idea, maintainers are free to include '?' in ASCII field in
such cases, just I wrote before.  Of course maintainers are free
to choose 'ue' or 'u' (or even literally '&uuml;' or '\"u' or even
'foobar') for &uuml; .  On the other hand, in your idea b) (require
using utf-8), maintainers cannot control how their own name is
displayed when their local characters are not available.  Also,
even if you choose '?', you can explicitly show your will to use
'?' by supplying ASCII field.

I don't understand why you don't like supplying ASCII version
of your name.  It seems that you just don't want to face up
to the reality that not every people in the world cannot read
your diacritics.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: