[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Fri, Jul 27, 2001 at 04:13:16PM -0700, Yves Arrouye wrote:
> I missed some of the discussion, but isn't there some difference between
> selecting an encoding and having many languages (what multilingual means)?
> 
> The package description file could be in UTF-8 all the time (ASCII being a
> very nice subset of that, so nothing breaks) and then records would be
> localized (the same record would appear many times, tagged with locale
> information). Then we would say "We require at least an English version of
> each field" instead of saying "Use ASCII."
> 
> I am sure you discussed that but I missed it. If not, ...
> 
> YA

Short summary:
There are in fact 3 different issues that were discussed together,
I am now trying to explain them separated:
(by dselect I mean "dselect and similar debian tools")

1) if foreign language descriptions are permitted in Packages
Take for example lyskom-tty-client, which is interesting only
for swedish speaking people, why should it not have description
in Swedish then?

There was not clear consensus about the whole issue, only about one
thing: Packages must have description in English, so that administrators
know what are they about. It was undecided if the description can have
also other language part (maybe even more detailed than the english one).

2) Localized fields in debian/control, such as Description-fr etc.
This is a different issue than 1), and has not been much discussed.
Probably the same way as debconf follows could be adopted.
Notice that even in English, there is an occasional need for
diacritics.

3) Most controversal part: what encodings are permitted in Packages
(and related files, such as debian/changelog...)
There are these main possibilities:
  a) mandate ASCII only.
     advantages: 
     - it works as intended everywhere, no matter what your
       locale is (since ASCII is an intersection of all other locales)
     - is easy to maintain
     disadvantages:
     - if 1) is accepted, then ASCII is clearly insufficient
     - there is no way you can have proper maintainers' names
       if the encoding is ASCII. Some languages (Japanese, German) have 
       standartized way of transcribing names into ASCII, some
       others (Russian, Slovak, Hungarian) have not. Some other 
       non-latin-script-based languages (Serbian, Chinese) have standartized
       way of transcribing names into latin script with diacritics.      [1]
     - for 2), one can assume there will be one original Packages file,
       and a groups of people will be translating it into a target language.
       Once the names are incorrect (ASCII only) in original Packages,
       there is no easy way to put them corerct into translated Packages,
       even if the required diacritics or script is present in target 
       language.
  b) require using utf-8
     advantages: 
     - ASCII is subset, so all tools will work on this (those that
       do not are buggy and should be fixed)
     - no problems with encodings, diacritics, foreign names etc...
     - no problems with translation
     disadvantages:
     - lack of proper utf-8 terminal (locale). One can expect this will improve
       in a future
     - lack of utf-8 editors. Once vim 6.0 enters debian, situation
       will be better
     - people generally do not use utf-8 locales. This is yet for
       another fla^H^H^Hdiscussion. IMHO, we should strive to make
       utf-8 locales usable, so that there is no more need for national
       ones.
     - peole over the world are not familiar with non-latin scripts.
       It seems reasonable to require latin-script (not necessary ASCII)
       version of names to be included alongside the original ones
       See [1]
     I do not buy the argument that this way, dselect will not work
     on non-utf8 console. It will work! You will just see question marks
     (or other garbage) in place of utf-8 characters. Situation will be
     no worse than the current one (see c))
  c) leave the situation as it is today: no encoding is specified,
     maintainers who feel the need to put non-ASCII characters there
     just put them there in an encoding which seems natural to them.
     advantages: 
     - if the administrator has by a chance the same locale charset as
       the maintainer, he will see the text as intended
     - easy work for the maintainers
     disadvantages:
     - it is a mess. In order to see the name (or description) properly,
       you have guess the encoding and set up your console for it
     - nobody in the world can see Packages properly, since there is
       no common console setting.
   d) this was just briefly suggested and not discussed:
      use mime headers to specify charset
      Ugh... please, we really do not want to put this into dselect, do we?       
      (and besides, once that recoding is put into dselect, we may as well
       stick to utf-8 and save some additional effort)

-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: