[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Wed, 11 Jul 2001, Radovan Garabik wrote:

> On Tue, Jul 10, 2001 at 10:02:19AM -0500, Steve Langasek wrote:
> > I think it would be nice to have support for full Unicode for maintainer
> > names (my own last name properly falls into ISO-8859-2), but I think this
> > should be done *in addition to* the regular Maintainer: field; e.g., we could
> > have an optional Maintainer-utf8: field which allows for Unicode
> > representations, but most tools would use only the Maintainer: field by
> > default.

> It strikes me as a bit of over-complication.

> Moreover, non-ascii chars can appear elsewhere, not just in
> maintainer's name (e.g. in package description)

No.  You seem to be the only one arguing here that allowing non-ascii
characters interspersed throughout the file is in any way a good thing.  The
package descriptions - the *default*, *international* package descriptions -
should all be in English, and should all be limited to ASCII.  The current
state of the art on Linux doesn't allow using UTF-8 everywhere, so allowing
non-ASCII characters in the package description just means it'll be harder for
some people to read these descriptions; and allowing multiple languages means
that the package file will be equally unusable for everyone.

It's important to support i18n, but not in such a way that it makes life
difficult for our users.  Throwing UTF-8 at everybody when the existing tools
can't handle it definitely makes life difficult for our users.


> If the user wants to choose between full utf-8 or plain ascii version,
> dselect should be able to do the conversion (and, there is already
> tool for that: filterm(1))

If the dselect maintainers wish to support this mode of operation, that's
fine; until that's implemented, non-ascii characters should be disallowed in
almost all fields.  Also, dselect is not the only tool used to read the
Packages file.

Users should not need to know about konwert in order to get package
descriptions they can read.

> Or, better yet, when localized Packages are implemeted, one of the
> localizations can be en_US.ASCII.

Yes -- the default l10n.  The default locale on Linux is the C locale, and
there is a reason for this: it's the least common denominator, the only thing
we can rely on being supported by all applications.  For the same reason, the
default Packages file should be in the C locale (with the exception of any
fields explicitly defined to be UTF8), so that it remains equally accessible
to everyone regardless of the software they have available.

Steve Langasek
postmodern programmer



Reply to: