[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Wed, Jul 11, 2001 at 10:57:40AM -0500, Steve Langasek wrote:
> On Wed, 11 Jul 2001, Radovan Garabik wrote:
> 
> > On Tue, Jul 10, 2001 at 10:02:19AM -0500, Steve Langasek wrote:
> > > I think it would be nice to have support for full Unicode for maintainer
> > > names (my own last name properly falls into ISO-8859-2), but I think this
> > > should be done *in addition to* the regular Maintainer: field; e.g., we could
> > > have an optional Maintainer-utf8: field which allows for Unicode
> > > representations, but most tools would use only the Maintainer: field by
> > > default.
> 
> > It strikes me as a bit of over-complication.
> 
> > Moreover, non-ascii chars can appear elsewhere, not just in
> > maintainer's name (e.g. in package description)
> 
> No.  You seem to be the only one arguing here that allowing non-ascii
> characters interspersed throughout the file is in any way a good thing.  The
> package descriptions - the *default*, *international* package descriptions -
> should all be in English, and should all be limited to ASCII.  The current
> state of the art on Linux doesn't allow using UTF-8 everywhere, so allowing
> non-ASCII characters in the package description just means it'll be harder for
> some people to read these descriptions; and allowing multiple languages means
> that the package file will be equally unusable for everyone.

I already discussed this: there are certain packages where foreign
language description is useful. (Of course, there SHOULD be an
English translation as well, I am not arguing about this!)

There is no policy about it, if we just put into policy that
other languages than English are forbidden in Packages, I'll shut up
(after making another round of flaming, of course :-))

And besides... proper English _does_ have diacritics (naive being the most
often cited example, Pokemon perhaps more familiar today), just
those ignorant Americans <duck, run> do not realize it.

There can be a package having a proper name in Description,
how are you going to handle this?

> 
> It's important to support i18n, but not in such a way that it makes life
> difficult for our users.  Throwing UTF-8 at everybody when the existing tools
> can't handle it definitely makes life difficult for our users.
> 

the key poin is _existing_
Unicode support is improving all the time.

Moreover, if UTF8 is used in Packages, it does not mean like people
without utf-8 tools cannot access it - they won't see a letter or two of
maintainer's name (big deal), they won't see properly a package
description in a foreign language (which they would not understand
anyway), maybe some graphical glitches in dselect (which depends on
ncurses' ability to deal with utf-8 anyway), and most importantly, if
they want to see the name properly, they would KNOW what they should do
(current situation is such that one has to guess what encoding is there
in Packages - fortunately, it is only some not important words in some
not important packages)

> 
> > If the user wants to choose between full utf-8 or plain ascii version,
> > dselect should be able to do the conversion (and, there is already
> > tool for that: filterm(1))
> 
> If the dselect maintainers wish to support this mode of operation, that's
> fine; until that's implemented, non-ascii characters should be disallowed in
> almost all fields.  Also, dselect is not the only tool used to read the
> Packages file.
> 
> Users should not need to know about konwert in order to get package
> descriptions they can read.

not by default, I agree. konwert is for those people with broken
non-utf8 capable terminals (from future's point of view)

> 
> > Or, better yet, when localized Packages are implemeted, one of the
> > localizations can be en_US.ASCII.
> 
> Yes -- the default l10n.  The default locale on Linux is the C locale, and

You see Packages as a (part of a) tool for installing programs.
I see Packages as a central database containing informations about 
packages, and as such, it should be as complete as possible.
Think about translations effort: when accented characters are not present in
central Packages file, putting them back into Packages-<lang> file
would be kind of difficult.

> there is a reason for this: it's the least common denominator, the only thing
> we can rely on being supported by all applications.  For the same reason, the
> default Packages file should be in the C locale (with the exception of any
> fields explicitly defined to be UTF8), so that it remains equally accessible
> to everyone regardless of the software they have available.

but Packages file is hardly used outside of debian, so if the debian infrastructure
is moving towards utf-8, I see no reason not to allow utf-8 in Packages
(not today, of course, but we should not concentrate on temporary short term
patches like Maintainer-utf8 where better solutions are on the horizon)

The key point, as I see it, is the move from national charset locales
to UTF-8 locales. This would be painful, but has to be done sooner or later.
Glibc is moving towards this, perhaps a few more glibc releases and 
utf8 locales will be feasible.

-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: