[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Mon, Jul 30, 2001 at 01:04:50PM +0200, Michael Bramer wrote:
> 
> IMHO the packages, the crontroll file and the Package files are to big
> with this. We need a better system!

probably. But we have to keep with current one for a while

> 
> Now we translate the description and save the translation in a
> database. (it is not a real database, but this it not your problem).
> With this db and a Packagefile we make daily new translated Packages
> files. 
> 
> see i18n@lists.debian.org, auric.debian.org/~grisu/ddts,
> http://www.laespiral.org/proyectos/debian-es, and other sites.
> 
> > 3) Most controversal part: what encodings are permitted in Packages
> > (and related files, such as debian/changelog...)
> 
> debian/changelog is not the problem. This file will never translated.

it is not about translation, it is mostly about default character set here

> 
> we make this. We make a translated Packages file, but we do only
> change the 'Description:', not the other fields.
> 

and that is suboptimal.

Take a maintainer, lets say Radovan Garabík (myself). Now, this
maintainer field says "Radovan Garabik" since I was sensible enough not
to put "í" in ISO-8859-2 (or any other) encoding there.
If you do a translation into Slovak/Czech, there is no reason whatsoever
not to include the correct name. In fact, using it without diacritics
will look really dumb, when the rest of dselect is in correct
Slovak/Czech. (and I am lucky, there is just this one little acute
accent in my name)

> But you can translated 'names' the same way. Make a table with the
> corerct spelling and translated this too.

And where do you want to keep the table with correct spellings?
My idea was to keep it in original Packages, in utf-8 encoding

> I regard now only the Package file. In the Package file we have only
> two importend fields with translation/encoding: 
>  - Maintainer
>  - Description
> 
> I propose: 
>  - use only ASCII in the (normal) Packagefile
>  - make a database with translation of the Maintainer's and

Or you can unify Maintainer fields and keep it in original Packages.

>    Description (the Description is imho more importent). 

yes, description is important, so unlike maintainer's name it should be
guaranteed to be readable everywhere - that implies ASCII encoding (the
original description, in English, in original Packages).

On Mon, Jul 30, 2001 at 08:16:37PM +0900, Tomohiro KUBOTA wrote:
> > 
> > There was not clear consensus about the whole issue, only about one
> > thing: Packages must have description in English, so that administrators
> > know what are they about. It was undecided if the description can have
> > also other language part (maybe even more detailed than the english one).
> 
> I don't have strong opinion on this issue.  However, I prefer all
> packages having English description.  Until translation mechanism
> will be available (i.e., issue (2)), writing both (English and your
> favorite language) would be a tentative solution.

I am glad to see translation mechanism is already being worked upon.

> 
> > 3) Most controversal part: what encodings are permitted in Packages
> > (and related files, such as debian/changelog...)
> > There are these main possibilities:
> ...
> >   b) require using utf-8
> ...
> >      disadvantages:
> ...
> >      - people generally do not use utf-8 locales. This is yet for
> >        another fla^H^H^Hdiscussion. IMHO, we should strive to make
> >        utf-8 locales usable, so that there is no more need for national
> >        ones.
> 
> There are reasons not to use UTF-8 locales even when UTF-8 locales
> will be available.  These problems are so complex that I have no idea
> how to solve these problems.  And more, Unicode people tend not to
> even agree there ARE problems.  (Yes, I sometimes send mails to
> Unicode Consortium to ask to fix bugs).

Problems should be made visible and discussed, and solutions
should be find, instead of just telling "unicode is bad, we are never
going to accept it" (no, I am not talking about you, Tomohiro, I know you are 
reasonable, but I know several people with this attitude)
IMHO there is no other (better) alternative for global encoding than unicode,
and unicode is not _that_ bad, and is already getting strong position
elsewhere.

> 
> 
> >      I do not buy the argument that this way, dselect will not work
> >      on non-utf8 console. It will work! You will just see question marks
> >      (or other garbage) in place of utf-8 characters. Situation will be
> >      no worse than the current one (see c))
> 
> You are right, if the author of the debian/control KNOWS that there
> are people who cannot read these characters.  It is true we can convert
> all unprintable characters to question marks using iconv().  However,
> the converted sentence with too much question marks may be meaningless.

It will mostly concern names. You will not see names properly. Yes, this
is not ideal, but not much else can be done about it.
(oh, well, and occasional diacritics in english words like rôle and résumé,
and I think it is reasonable leaving this up to maintainer's common sense - 
to decide if he prefers "correct" usage no matter what or leaves diacritics 
out and saves people without utf-8 console some headache)

> If the author knows that non-ASCII characters may displayed as question
> marks, the author can take care to write meaningful sentence for people
> all over the world.

Agreed.

> 
> If dselect and so on does not do any conversion, garbage can be more
> miserable than you imagine.  1. not only the unprintable character
> but also the following characters will be broken in multibyte locales.

how many of them? 
Maybe one, two - does current Packages give you many problems?
(as long as it does not produce switching ESC sequence, nothing
too bad happens - am I correct?)


> 2. Generally, number of bytes, characters, and columns differs one
> another.  The difference is different between locales.  Thus,
> mismatching of locale and encoding will break the layout of the
> screen.  (This is not a problem for dumb-terminal-based softwares

yes, this is bad. However, it does not make dselect unusable.

> such as dpkg which don't use '\b' and so on.)
> 
> 
> One more disadvantage.  Not only displaying maintainers' names and so
> on, people may want to _input_ them.  For example, one may search
> package list by maintainers' name.  However, inputing any non-ASCII
> characters by keyboard is much more difficult than displaying them.

Yes, this is a big problem. I was caught by this, when I installed
intranet fulltext search machine. (and I also made multilingual 
on-line dictionary).
Fortunately, searching by maintainers' names is not a keypoint for
debian tools.

> 
> Thus, my opinion is,
> 
>    e) mandate ASCII; UTF-8 is optional

optional in the same Packages file?

>       advantages:
>       - need no change for the present situation besides buggy packages
>         (i.e., packages which use non-ASCII characters).
>       - it is guaranteed that packaging tools work under any locales.
>       - maintainers who want to use non-ASCII characters can use them.
>       - maintainers can select ASCII transliteration (for example, 
>         it is maintainers' choice to use "u", "ue", or "?" for "u"
>         with umlaut.)

they could do this under b), if they want

>       disadvantages:
>       - maintainers who want to use non-ASCII characters are forced
>         to supply two versions of descriptions (or names,...).  However,

or they could decide if they prefer not to include the ASCII version at all,
so that nobody is confused by incorrect variant of their name (I am talking
now about latin-script names with diacritics)

Well, this is not so much different from b).
I'd say, it is almost the same :-)

>         since many maintainers don't know well about encodings in spite
>         of Debian wanting to be an internationalized distribution, 
>         it is a good thing to force maintainers to study about i18n.
>         This will force maintainers to study i18n.

A Good Thing absolutely!
 

-- 
 -----------------------------------------------------------
| Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: