[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Fri, Aug 03, 2001 at 03:55:06PM +0900, Tomohiro KUBOTA wrote:
> Hi,
> 
> At Tue, 31 Jul 2001 11:22:41 +0200,
> Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk> wrote:
> 
> >> However, the fact you need the mixture of languages does not mean that
> >> people in the world need it.  Almost people in the world want to use
> >> their own language.  Some others need to use other languages.
> >
> > Ok, let's dump those others. Majority of people needs just one language.
> > And by the way, when we are already doing this, let's dump debian.
> > Majority of people use RedHat anyway. Or dump the whole linux - 
> > Almost all people in the world use windows anyway.
> 
> I don't understand your point.  My focus was that UTF-8 is better

Well, your reasoning suggested that you do not care about multilingual
people, because there are a minority (no offence intended, I know you
did not meant it this way)

> for multilingual purpose but not almighty.  Sometimes other encodings
> are better.  If we stop supporting non-UTF-8 locales and (it means)
> force users to use UTF-8, such users will be annoyed.  If you really

I was not speaking about forcing users, I was talking about
default encoding of Packages (I repeat myself: you cannot have
proper Packages in any other commonly used encoding other than unicode)

> want all people in the world to use UTF-8, please try to improve
> weak points of Unicode.  Did you read my page

yes, I did

> http://www.debian.or.jp/~kubota/unicode-symbols.html ?  Did I wrote
> only about yen-sign problem?

no, but almost all of the problems (sans CJK unification) has nothing
to do with unicode, but with the need to support legacy encodings

> 
> It is freedom of users to use currently supported locales.  Developers
> must not force them to change their encodings.  What developers can
> do is to supply better UTF-8 support so that users come to think
> about migration to UTF-8.

I absolutely agree.

> 
> 
> > double-width characters? It should not be UTF-8's problem, UTF-8 is a
> > _text_ storage format, now WYSIWYG word processor format.
> 
> Do you know Unicode Standard Annex #11 East Asian Width?  I am saying
> it is buggy.  http://www.unicode.org/unicode/reports/tr11/

It may be buggy, but most of your objections were based on fact
that it is not compatible with legacy encodings.

David already commented on this, I could just repeat his reasoning :-)

> 
> 
> > Round-trip compatibility with existing legacy encodings? Not a problem
> > in UTF-8 itself. IMHO unicode could have been much more simpler if they 
> > did not try to keep codepoints for all those legacy encodings.
> 
> The same thing is said for precombined Latin characters like "u"
> with umlaut and so on.  They are introduced only for compatibility
> with legacy encodings such as ISO-8859-*.

And they simplify life a bit. But yes, unicode could have been designed
with combining characters only. It was not.

> 
> 
> > Missing characters? Yes, that IS a problem. So write a proposal to
> > unicode consortium. If they refuse it for no good reason, well, 
> > then there IS a REAL problem.
> 
> When I talked about missing characters?  I imagine CJK Han Unification

ages ago, about missing Kanji (maybe it was not you, I am too lazy to look
in the archives now)

> problem is similar to missigng characters but it is a problem of a
> definition of what is the identity of characters.  And more, we have

exactly. I could give you some counter-examples with latin letters 
unification, but I think you already must have heard a lot about similar
issues.
(Notice: I feel neutral about CJK unification - mostly because I
do not use Han characters :-), but I really understand both sides)

> 
> > But we should not be discussing technical aspects of unicode here,
> > this has already been flamed to death elsewhere.
> 
> Then do you tentatively agree that some people want to use non-UTF-8
> locales?

I never told anything else

> 
> 
> > someone (I forgot who) else already wrote it: The main reasons why
> > Japanese are against unicode is that they already have their own,
> > well-working, national character encodings, and do not like the idea
> > of changing it to something radically different.
> >
> > I believe he hit the issue directly :-)
> 
> There may or may not be some Japanese people who think so.  However,
> it is not a focus.  Your sentence is only a seed of flame.
> 

Sorry about that, I did not mean it. But, I cannot help myself to
feel there is some truth in it.

...  
> 
> 
> > I never told this. I told that under b), maintainers can by themselves
> > decide if they write ASCII-only version, UTF-8 only version, 
> > or both.
> 
> I see.  I am happy to agree that we will have both ASCII and UTF-8
> version of names.  Then, let's discuss further.  I think either of
> them should be mandatory because we should have a policy for encoding

and, because Packages translation effort seems to be starting up,
canonical (the one used for translating) should be UTF-8.

> of Maintainer: field.  I also think ASCII version is a good candidate
> for mandatory field.  Reasons:
> 
> 1. I think the need of UTF-8 field is well understood by the maintainers
>    themselves ("My name is not expressed well in ASCII!").  On the other
>    hand, the need of ASCII field is come from the technical fact that
>    various encodings are used in the world.
> 
> 2. If ASCII field is not
>    mandatory, transliteration method is left for softwares such as
>    dselect. 

or terminal

>    Some softwares may use '?', some softwares may use some
>    transliteration library, and so on so on.  This will surprise users
>    who use such softwares because a maintainer's name differ when using
>    different softwares even under same locale.
> 

That is true. However, I think it is acceptable. (more so when the user is
warned about the possibility and knows that correct version can be obtained
if he uses utf-8 terminal)

> 3. Generally maintainers don't test their softwares in different locale.
>    Similarly, they don't imagine how their names are displayed in different
>    locale.  They should know it.

If the idea of Packages-ascii catches on, they just need to care about two
locales: ASCII and UTF-8.

> 
> 4. Current and older versions of dselect and so on don't think about
>    transliteration.  
> 
> 
> I don't stick the name of the field, if ASCII version of name is
> supplied systematically.  For example, Maintainer: field can be UTF-8
> version if lintian issues an error if non-ASCII characters is used
> for Maintainer: field and no Maintainer-ascii: field is supplied.

Could be, or vice versa (neither I do stick to the name of the field)

> 
> Of course UTF-8 name can contain not only diacritical Latin characters
> but also any characters in Unicode, including Ideogram, Thai, Indic,
> Hebrew, Arab, Hangul, Hiragana, Katakana, Cyrillic, Greek, and so on.
> 
> > Do not take me wrong, I am well aware of your reasons for ASCII only
> > Packages, I just feel a bit more radical :-)
> 
> Radicality sometimes comes from ignorance of minority people.
> However, unfortunately, the world is not so simple as you think.
> 

Well, supporting of legacy encodings has to stop somewhere (debian
does not support uppercase-only locale, right?). I know that in Japan
legacy encodings are very firmly rooted and replacing them is
just impossible, though.
Speaking for myself, I would gladly give away my ISO-8859-2 locale if I
could use UTF-8 one instead.

-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: