[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Mon, 30 Jul 2001, Radovan Garabik wrote:

> 2) Localized fields in debian/control, such as Description-fr etc.
> This is a different issue than 1), and has not been much discussed.
> Probably the same way as debconf follows could be adopted.
> Notice that even in English, there is an occasional need for
> diacritics.

How is there a *need* for diacritics?  Most of the English-speaking world has
been getting by just fine on ASCII for the past 20 years or so; in fact, this
has been the source of plenty of integration problems in the past when dealing
with multinational issues.

I haven't seen any support for this notion from those who speak and write
English as their first language.  Thus, I would expect the primary
Description: field to consist of 7-bit ascii only (the C locale).  People who
really wanted to spice up their English descriptions can use the
Description-en field, as suggested otherwhere.


>   b) require using utf-8
>      advantages:
>      - ASCII is subset, so all tools will work on this (those that
>        do not are buggy and should be fixed)

Euhh... It's true that ASCII is a subset of UTF-8.  It does not follow that
all tools will be able to work with UTF-8 successfully.  Anything that needs
to know the length of a string will have difficulty if it doesn't know the
charset+encoding.  Anything that makes assumptions about character width will
break.  Yes, these are bugs in the tools, and they need to be flushed out --
but they need to be flushed out BEFORE anyone starts considering using UTF-8
in the Packages file!

If you want to see UTF-8 become a reality, I would suggest focusing on
bringing these tools up to snuff rather than debating the question on
debian-devel.  Until someone has identified the issues with the existing tools
and attempted to address those issues, this is all theoretical discussion.
That includes identifying issues for users whose default locale is a multibyte
non-Unicode locale.

>      - peole over the world are not familiar with non-latin scripts.
>        It seems reasonable to require latin-script (not necessary ASCII)
>        version of names to be included alongside the original ones
>        See [1]

Yes, it should be ASCII.  ASCII is the common denominator that's present in
all character sets, and it's the only character set that we can guarantee will
be portable across locale boundaries.  There will be people using non-UTF8
terminals, mail readers, etc., for years to come, and filling our archive with
UTF8 text won't solve that problem.  We should not be pushing Unicode at the
expense of our users' ability to make use of the system.

Steve Langasek
postmodern programmer



Reply to: