[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



Hi,

At Mon, 30 Jul 2001 11:17:41 +0200,
Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk> wrote:

> Short summary:
> There are in fact 3 different issues that were discussed together,
> I am now trying to explain them separated:
> (by dselect I mean "dselect and similar debian tools")
> 
> 1) if foreign language descriptions are permitted in Packages
> Take for example lyskom-tty-client, which is interesting only
> for swedish speaking people, why should it not have description
> in Swedish then?
> 
> There was not clear consensus about the whole issue, only about one
> thing: Packages must have description in English, so that administrators
> know what are they about. It was undecided if the description can have
> also other language part (maybe even more detailed than the english one).

I don't have strong opinion on this issue.  However, I prefer all
packages having English description.  Until translation mechanism
will be available (i.e., issue (2)), writing both (English and your
favorite language) would be a tentative solution.


> 2) Localized fields in debian/control, such as Description-fr etc.
> This is a different issue than 1), and has not been much discussed.
> Probably the same way as debconf follows could be adopted.
> Notice that even in English, there is an occasional need for
> diacritics.

For English diacritics, "Description-en-gb" and so on would be used.
However, I think this problem is not limited to debian/control files.


> 3) Most controversal part: what encodings are permitted in Packages
> (and related files, such as debian/changelog...)
> There are these main possibilities:
...
>   b) require using utf-8
...
>      disadvantages:
...
>      - people generally do not use utf-8 locales. This is yet for
>        another fla^H^H^Hdiscussion. IMHO, we should strive to make
>        utf-8 locales usable, so that there is no more need for national
>        ones.

There are reasons not to use UTF-8 locales even when UTF-8 locales
will be available.  These problems are so complex that I have no idea
how to solve these problems.  And more, Unicode people tend not to
even agree there ARE problems.  (Yes, I sometimes send mails to
Unicode Consortium to ask to fix bugs).


>      I do not buy the argument that this way, dselect will not work
>      on non-utf8 console. It will work! You will just see question marks
>      (or other garbage) in place of utf-8 characters. Situation will be
>      no worse than the current one (see c))

You are right, if the author of the debian/control KNOWS that there
are people who cannot read these characters.  It is true we can convert
all unprintable characters to question marks using iconv().  However,
the converted sentence with too much question marks may be meaningless.
If the author knows that non-ASCII characters may displayed as question
marks, the author can take care to write meaningful sentence for people
all over the world.

If dselect and so on does not do any conversion, garbage can be more
miserable than you imagine.  1. not only the unprintable character
but also the following characters will be broken in multibyte locales.
2. Generally, number of bytes, characters, and columns differs one
another.  The difference is different between locales.  Thus,
mismatching of locale and encoding will break the layout of the
screen.  (This is not a problem for dumb-terminal-based softwares
such as dpkg which don't use '\b' and so on.)


One more disadvantage.  Not only displaying maintainers' names and so
on, people may want to _input_ them.  For example, one may search
package list by maintainers' name.  However, inputing any non-ASCII
characters by keyboard is much more difficult than displaying them.



>    d) this was just briefly suggested and not discussed:
>       use mime headers to specify charset
>       Ugh... please, we really do not want to put this into dselect, do we?

MIME header is a confused idea with misunderstanding the focus.
Keeping debian/control in 7bit is not a problem.  The focus is
how to display intended characters in the file.  The worst idea.

Thus, my opinion is,

   e) mandate ASCII; UTF-8 is optional
      advantages:
      - need no change for the present situation besides buggy packages
        (i.e., packages which use non-ASCII characters).
      - it is guaranteed that packaging tools work under any locales.
      - maintainers who want to use non-ASCII characters can use them.
      - maintainers can select ASCII transliteration (for example, 
        it is maintainers' choice to use "u", "ue", or "?" for "u"
        with umlaut.)
      disadvantages:
      - maintainers who want to use non-ASCII characters are forced
        to supply two versions of descriptions (or names,...).  However,
        since many maintainers don't know well about encodings in spite
        of Debian wanting to be an internationalized distribution, 
        it is a good thing to force maintainers to study about i18n.
        This will force maintainers to study i18n.


---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: