[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Mon, 30 Jul 2001, Radovan Garabik wrote:

>>> 2) Localized fields in debian/control, such as Description-fr etc.
>>> This is a different issue than 1), and has not been much discussed.
>>> Probably the same way as debconf follows could be adopted.
>>> Notice that even in English, there is an occasional need for
>>> diacritics.

>> How is there a *need* for diacritics?  Most of the English-speaking world
>> has

> Maintainer names. What is the discussion mostly about?

Rephrase.  How is there a need for diacritics *in English*?  This is what you
asserted in the email I was replying to.  I maintain that English does not
need diacritics.

Maintainer names are a separate issue.

>> If you want to see UTF-8 become a reality, I would suggest focusing on
>> bringing these tools up to snuff rather than debating the question on
>> debian-devel.  Until someone has identified the issues with the existing tools
>> and attempted to address those issues, this is all theoretical discussion.

> I have tried UTF-8 console.
> The most visible problems are ncurses (being addressed upstream),
> stty (I have no idea what would this need), and libreadline (I saw an old
> patch floating around, maybe I'll look into it sometime)

And what about the problems UTF-8 will cause for people who do not (or cannot)
use UTF-8 consoles?  Our Japanese developers appear to be of the opinion that
even if the tools for working with UTF-8 were all in place today, UTF8 would
not meet their needs for a character set.

>>>      - peole over the world are not familiar with non-latin scripts.
>>>        It seems reasonable to require latin-script (not necessary ASCII)
>>>        version of names to be included alongside the original ones
>>>        See [1]

>> Yes, it should be ASCII.  ASCII is the common denominator that's present in

> It should, but it could not. For translation effort not to look dumb,
> there is a need for _proper_ maintainers names somewhere. I am trying
> to put it into Packages. If you have other ideas, please tell.

I concede that it's useful to be able to represent Maintainer names in full
Unicode; that is not in question.  What I disagree with is the argument that
such non-ASCII characters should be included in existing fields of the Package
file.

If all Unicode is limited to new fields that we introduce into Packages,
there's a very simple mechanism that we can use to provide backwards
compatibility with even the most rudimentary of ASCII-only tools:

$ grep -vE '^(Description.+|Maintainer-utf8):' < Packages > Packages-ascii

will get you a clean, ASCII-only version of the Packages file that can be
handled by any software that isn't locale-aware.  (Remember that UTF8 causes
problems even for users of other multibyte locales, such as CJK locales).  Do
you have a backwards-compatibility solution for these people if UTF8
characters are allowed in all fields of the Packages file?


>> It does count as an argument against utf-8 in Packages, because it's not

> you meant s/utf-8/8-bit characters/, didn't you?

Whichever needs to be argued against. :)

>> agreed that the 8-bit characters in Packages SHOULD be there.

> And we should make a conclusion about this.
> I guess you agree current situation is bad, and should be corrected.
> The question is, which way to go: make Packages ASCII only, since ASCII is
> an intersection of all used encodings, or make Packages UTF-8, since
> UTF-8 is a union of all used encodings?

> Do not take me wrong, I am well aware of your reasons for ASCII only
> Packages, I just feel a bit more radical :-)

We cannot change all the world's software in a day.  Being 'radical' is a
disservice to our users, who need a system that continues to work between now
and the day that UTF8 is available everywhere.

Steve Langasek
postmodern programmer



Reply to: