Re: support for multilingual Packages files?
On Mon, 30 Jul 2001, Radovan Garabik wrote:
>>> 2) Localized fields in debian/control, such as Description-fr etc.
>>> This is a different issue than 1), and has not been much discussed.
>>> Probably the same way as debconf follows could be adopted.
>>> Notice that even in English, there is an occasional need for
>>> diacritics.
>> How is there a *need* for diacritics? Most of the English-speaking world
>> has
> Maintainer names. What is the discussion mostly about?
Rephrase. How is there a need for diacritics *in English*? This is what you
asserted in the email I was replying to. I maintain that English does not
need diacritics.
Maintainer names are a separate issue.
>> If you want to see UTF-8 become a reality, I would suggest focusing on
>> bringing these tools up to snuff rather than debating the question on
>> debian-devel. Until someone has identified the issues with the existing tools
>> and attempted to address those issues, this is all theoretical discussion.
> I have tried UTF-8 console.
> The most visible problems are ncurses (being addressed upstream),
> stty (I have no idea what would this need), and libreadline (I saw an old
> patch floating around, maybe I'll look into it sometime)
And what about the problems UTF-8 will cause for people who do not (or cannot)
use UTF-8 consoles? Our Japanese developers appear to be of the opinion that
even if the tools for working with UTF-8 were all in place today, UTF8 would
not meet their needs for a character set.
>>> - peole over the world are not familiar with non-latin scripts.
>>> It seems reasonable to require latin-script (not necessary ASCII)
>>> version of names to be included alongside the original ones
>>> See [1]
>> Yes, it should be ASCII. ASCII is the common denominator that's present in
> It should, but it could not. For translation effort not to look dumb,
> there is a need for _proper_ maintainers names somewhere. I am trying
> to put it into Packages. If you have other ideas, please tell.
I concede that it's useful to be able to represent Maintainer names in full
Unicode; that is not in question. What I disagree with is the argument that
such non-ASCII characters should be included in existing fields of the Package
file.
If all Unicode is limited to new fields that we introduce into Packages,
there's a very simple mechanism that we can use to provide backwards
compatibility with even the most rudimentary of ASCII-only tools:
$ grep -vE '^(Description.+|Maintainer-utf8):' < Packages > Packages-ascii
will get you a clean, ASCII-only version of the Packages file that can be
handled by any software that isn't locale-aware. (Remember that UTF8 causes
problems even for users of other multibyte locales, such as CJK locales). Do
you have a backwards-compatibility solution for these people if UTF8
characters are allowed in all fields of the Packages file?
>> It does count as an argument against utf-8 in Packages, because it's not
> you meant s/utf-8/8-bit characters/, didn't you?
Whichever needs to be argued against. :)
>> agreed that the 8-bit characters in Packages SHOULD be there.
> And we should make a conclusion about this.
> I guess you agree current situation is bad, and should be corrected.
> The question is, which way to go: make Packages ASCII only, since ASCII is
> an intersection of all used encodings, or make Packages UTF-8, since
> UTF-8 is a union of all used encodings?
> Do not take me wrong, I am well aware of your reasons for ASCII only
> Packages, I just feel a bit more radical :-)
We cannot change all the world's software in a day. Being 'radical' is a
disservice to our users, who need a system that continues to work between now
and the day that UTF8 is available everywhere.
Steve Langasek
postmodern programmer
Reply to: