Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote:
> On Tue, 4 Sep 2001, Michael Bramer wrote:
> > On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote:
> > > > I don't know enough about gettext - am I assuming correctly that in
> > > > the .mo file, the English translation is replaced with a checksum or
> > > > similar, so you do not need to store the complete English translation?
> > > Gettext normally uses the entire untranslated string as the key in the .mo
> > > file. This has many advantages when dealing with translation of strings in
> > > programs, where the untranslated string is actually present in the program
> > > source, and this is a big reason the GNU project favors gettext over catgets
> > > systems found on other Unices. It makes less sense in the case of package
> > > descriptions, however, because we're effectively doing two lookups -- first to
> > > find the English description in Packages.gz using the package name and version
> > > as a key, then to find the translated description in the .mo file using the
> > > English description as a key.
> > yes, you must two lookups. First in the package db (normal in the
> > menory) and (if LANG is set) make a second lookup with gettext.
> > But this not a big problem, or is there a problem?
> It casts doubt on the argument that gettext is a good solution here. Just
> because gettext is the optimal solution for translation of messages within
> programs does not mean it's the best solution for package translations. I'm
> personally willing to do a little wheel-engineering if it leads to a more
> elegant result.
> > If you put the translated text only in the db, and you don't use the
> > english text as key (like gettext) you get maybe outdated translation.
> Only if the implementation is poor. The accuracy of a translation can be
> verified in the process of assembling the file that is to be made available to
> user machines (whether that file is Packages.gz, or debian-descs.mo, or
> whatever). Obviously the /inputs/ used to create this file must include
> mappings of English string -> translated string, but these mappings need not
> be retained in the output file. We only need to make sure once that the
> translation is up-to-date, not every time the user runs dpkg, because each
> version of each package can have only one untranslated description associated
> with it -- it's a unique key, by definition.
> If nothing else, perhaps you would consider that a .mo file containing
> [untranslated string -> translation] mappings will on average be almost twice
> as large as a .mo file containing [(package name,version) -> translation]
> mappings. :)
The problem is that you wont have to do a little wheel engineering, but a
lot of. Think, you will have to design:
- the extracting tool control -> po file
ok, that's true for all solutions ;) I'm working on a patch against
gettext so that it can handle text following rfc822.
- a mechanism to help the translator finding which text have to be
translated in the po file.
With your solution, the translator will face something like
and how will them find what text they have to translate ? most of the
translators I know are running the stable version of debian because they
are not as geek as maintainers.
- a mechanism to produce the mo file, or what ever. If you stick to the po
format, you can reuse msgfmt, through.
- an output mecanism, including the fallback to original if the translation
is outdated. You have either to rewrite msgfmt to do this job at previous
step, or design a new function in dpkg, apt, grep-dctrl, and all programm
using the translated descriptions.
If you change any tool of the gettext mechanism, you lost advantages from
the translator point of view, like compendium, containing standard
translations for reuse, or user-friendly tools like kbabel for translating,
(including ispell possibility, which is implemented in kbabel, and some
For what gain ?
A lookup less ? But gettext is cached, and well optimized. I think the
change and redesign is too much, regarding to the small speedup you can
Smaller resulting po files ? Come on, the woody+1 release will come on 6 CD
or more, and you are speaking about saving a few Mb... These data will be
well compressed, as any natural text, so that a minor problem, in my point
Un clavier azerty en vaut deux.