Hello all
After I read some more mails and write some comments myself, IMHO it
is time to write a newer hopefully better proposal. Not all is new.
But I add some new thoughs and some parts from some comments.
In this proposal I have combined the decentralized translations, and
also the central repository. And this all without a delay in the
translator to user path.
Not all parts are turned into stone. I need some comments and decision
on some parts. Maybe you can help.
One quote from a mail from Raphael Hertzog:
I find that having translations is far better that having not a
single one and refusing to add them because we can't have the perfect
solution right now.
Add Translations of the Package Description
in the Debian Distribution
(c) Michael Bramer <grisu@debian.org>
1.) use all the time _gettext_!
All know gettext and all use this. Why should we use gettext to add
the translated description in the debian describution? Because of
this. Gettext is *the* technic for translations.
All know it, you need not teach a maintainer, you need not teach a
user (a important point). If a user already use a system with
locale enviroment, he just will have translated descriptions in
future.
gettext make all the work and gettext is tested (and is useing in many
programes). With this you need only some little pachtes. (We show a
-9/+30 patch for dselect/dpkg and hopefully a apt patch will it not
much bigger.) Gettext show never outdated translation (a big point)
and have other nice features (see below).
Maybe the release manager will allowing this patch in woody, but
this is a other story.
If apt and dpkg is patched and the user have a nice .mo file in
/usr/share/desc-trans/<locale>/ all output of _all_ package
management programs is transled. (dpkg and APT use a patch, other
programs (like deity, etc.) use APT)
gettext support already fallback languages. See [1] for more
informations. If I understand the gettext source code in the right
way, the fallback is per message and not per .mo file. With this
someone can set LANGUAGE=hu:sl:cz and get a
hungarian->slovak->czech->->english fallback path. (If a
description is translated in slovak but not in hungarian, the user
will see the slovak description.)
This is all nice, and we have only one problem. How will the user
get a nice .mo file?
First on comment on this question: You have this problem all the
time with the description. You must download the
descriptions and the translations first. Only and after this, you can
use (see) it and install the real programs/packages.
With the normal (english) Descriptions we use the Packages files
(with apt or dselect (the old methodes)) We must use somethink
like this with the translations too...
2.) get the .po/.mo files on the system
If we will use gettext, we must get one .mo file on the system.
The .mo file is generatted from a .po file and it is itself a binary
data file. If you have some sources (like ftp.debian.de and a
local mirror with own packages) you will have some translations and
some .mo/.po files.
The best way is, that you download the .po files, merge this files
with a tool and make from this one big .po file a .mo file and use
this file. (maybe you must only make a 'cat *.po > master.po', I have
not test this now, but this is only a technical question and
problem)
I propose the dir /usr/share/desc-trans/<locale>/desc-trans.d/ to
store all .po files.
If you make a apt-get update (or a other funktion like this in
deity and co), you have (maybe) new and changed description in the
apt database. And now you need a newer, better .po file. Because of
this, I propose to download the .po like file (see below) with apt
by the update process.
What is the size of all this? Ok. we have now in sid/main/i386 (see
[2]) 7000 Packages and the descriptions of all this packages is
2660993 bytes big. We get a description size per package of 384 bytes.
With gzip we will get (maybe) 130 bytes.
With this the size on the system is like the Package files from
apt. If you have some sources you will have some (5-20) Megabytes in
/usr/share/desc-trans/<locale>/desc-trans.d/ and a collect .mo file
per language.
But the admin of the system must pay this price, if he will see translated
descriptions. (and it don't care if we use gettext or a other
technic, with gettext we have only the extra .mo file.)
But what file should apt download? The first thought is maybe a
translated Packages-XX file. But the first thought is not the
best way all the time.
We have _now_ 316 Packages* (see [3]) files on ftp-master with 141
MByte of size. If we translate this all in (only) 10 languages we
need 1,4 GByte. With more Packages and more Languages more and
more. Ok, harddisk are cheap, but not free. This is not the right
way.
In a Packages file is not only the Description. You know, it
include all other tags from the control file. If we delete this
tags and put only the Description in one file and make
Descriptions-XX files, we save 50% of size. And if we save one
Description-XX file per dist and not per arch, we save more.
With this we need only 30 Descriptions files per languages [4].
This should only 14 MByte per languages (if all descripions are
translated). This files have only the package name and the
translated Description (and maybe the Version) in it. The APT
process can generate some .po files from the normal Packages file
and all downloades Descriptions files.
If we don't like this process on the client all the time, we can
produce Descriptions-XX.po files and the clinet must only download
this file and save this in the right dir. But this file will
include the orignal description and with this it has the double
size and download time.
With the Descriptions-XX[.po] file the admin must only download the
needed languages and not all languages.
As the first step (and litle hack), we can produce
desc-trans-XX.deb with only the .po file. A user can download this
file, install it, and have translated description. If we have patch
katie etc. and we have the Descriptions files in all the mirrors,
we don't need this deb and can remove this from the archiv.
4.) How get katie (or the desc-trans-XX.deb) the translation?
Katie get the translation from the deb package itself (see next
point) or from a override file as fallback. The ddts (Debian Description
Translation Server) can produce the override file. Normal the
translator get the untranslated description from this server and
send the translation to this server. The server make the whole
work. If a description changed, it send mails to the translator of
this translation, send new descritions to the translator and send
notifications to the maintainer.
The maintainer has a veto and can remove a translation from the
ddts db. He can send improvements to the translator, etc. He is not
out of the loop. He only outsource the translations to the ddtp.
If a maintainer don't like the ddtp, he can translated the
description hisself, find own translators etc. This is not a real
problem. The ddtp is only a service for the maintainer and prevent
work on this site.
5.) translated descriptions in the package.
Now, this is the difficult part.
We need a way to add the translated description in the normal
package. In the last mails, we see some proposals.
In privat packages or if the maintainer know some langauges and
make the translation hisself, it is a good way to include the
translation in the package. I'm not convinced that this is a ok in
the normal debian archiv.
I see only one problem: the size.
We have now 80446 .deb packages and 7643 source packages in the
debian archiv on ftp-master. If we include the translation in the
deb, we must store this in the source and in every deb package.
check this calculation:
If in all sources are only one desription with 130 (geziped)
bytes of description we get 1 MByte per languages. If we use po
files in the source (see below), we get 2 MBytes per languages
And all deb packages have only one description with 130 (geziped)
bytes. This make 10 MByte per languages. If we store the
description as po file, we will use 20 MByte per languges.
11/22 MByte per languages, with only 10 languages we will get
110/220 MBytes.
With more Packages, ports, languages, this will grow. This bytes
must all be downloaded, uploaded and synced with the time.
And on the local system the descriptions and the translations of
all languages from the package will stored on the local harddisk
(without gzip). Count:
With 10 languages, 1000 installed Packages and 380 Bytes per
description and per translation you get additional 4/8MBytes on
the local disk.
Is this all usefull in a 'normal' deb package from the debian
project? Maybe yes. We must decide this. (I personal don't find the
real pro about this. But we can add it and I don't have a real
problem with this. I see only the size problem, and this is not a
big problem.)
In all the cases I propose: store the description in the source as
.po file in the /debian/ dir (one per languages). This is the
only real good way to store the translations. (no encodeing
problem, no outdated text, no debconf-mergetemplate hack, ...)
But how get the maintainer the translation? We have some cases:
- The maintainer translate the description hisself
- He find some own translator (like now with debconf)
- He use the ddtp
- He can ask the ddts and get all translations of the package
- He can use the override file of katie
- He use the notification mails from the ddts (In future the
server will use the decided format in this mail. With this,
the maintaner must only copy this file in the source.)
Now the technique part:
The proposal with the biggest patch, is the 'put the translation in
a own element in the deb ar'. Maybe this is nice and feasible.
But this is not a fast way.
Because of this I propose some solutions:
1.) (very fast)
put the translation as normal .po file in the
/usr/share/desc-trans/<locale>/desc-trans.d/ dir. finish.
This don't need some extra work on dpkg etc.
2.)
Put the translation in the control.tar.gz of the deb. Maybe as
desc-trans.tar.gz with all translation.
We can put this as real po file or as description file (without
orignal description). dpkg --info can use this and show all
included and translated descriptions.
If the package only include the translated description (and no po
file), a gettext like process must assure, that no outdates
translation will include in the package!
While the package installation dpkg should move this files to the
/usr/share/desc-trans/<locale>/desc-trans.d/ dir. (If the
translation is not in the po file format, dpkg generate a po file
from the translation and the orignal description)
3.) (the long way, if possible)
Add the desc-trans.tar.gz in the deb ar as a own new element. The
other points are like 2.).
But this has the big feature, that some process on ftp-master can
edit the .deb on the fly and change and/or add some translations.
Maybe this has some other problems.
All the time we should use a dh_*-script. With this we can start
with 1.) and can switch to 2.) or 3.) later. And maybe this script
can get the translation from some source itself.
6.) Transition to a debian with translations
- We have the first translations and the first step is a newer,
patched dpkg and apt.
Please can we have the opinion of Wichert and Jason for dpkg and
apt about the use of gettext for the translation of the
description?!
- The next step is a decision of the format in the deb file.
- The last step is the download of the translated description with
apt by the update process and the patch of katie to produce the
Description or Description.po files.
Maybe we get the first step with woody and the others with woody+1.
Appendix
[1] from the ABOUT-NLS from gettext source:
...
Not all programs have translations for all languages. By default, an
English message is shown in place of a nonexistent translation. If you
understand other languages, you can set up a priority list of languages.
This is done through a different environment variable, called
`LANGUAGE'. GNU `gettext' gives preference to `LANGUAGE' over `LANG'
for the purpose of message handling, but you still need to have `LANG'
set to the primary language; this is required by other parts of the
system libraries. For example, some Swedish users who would rather
read translations in German than English for when Swedish is not
available, set `LANGUAGE' to `sv:de' while leaving `LANG' to `sv_SE'.
In the `LANGUAGE' environment variable, but not in the `LANG'
environment variable, `LL_CC' combinations can be abbreviated as `LL'
to denote the language's main dialect. For example, `de' is equivalent
to `de_DE' (German as spoken in Germany), and `pt' to `pt_PT'
(Portuguese as spoken in Portugal) in this context.
...
[2]
grisu@auric:/org/ftp-master.debian.org/ftp/dists/sid/main/binary-i386$ grep-available -s Description "" Packages|grep ^Descrip|wc
6922 48709 372777
grisu@auric:/org/ftp-master.debian.org/ftp/dists/sid/main/binary-i386$ grep-available -s Description "" Packages|wc
50806 406596 2660993
$ bc -l
2660993/6922
384.42545507078878936723
[3]
grisu@auric:/org/ftp-master.debian.org/ftp$ find -name "Package*"|wc
316 316 15774
grisu@auric:/org/ftp-master.debian.org/ftp$ find -name "Package*"|xargs cat|wc
3266668 14826546 148475135
[4]
unstable/main
/contrib
/non-free
frozen/main
/contrib
/non-free
frozen-proposed-updates/main
/contrib
/non-free
stable/main
/contrib
/non-free
stable-proposed-updates/main
/contrib
/non-free
[5]
grisu@auric:/org/ftp-master.debian.org/ftp$ find -name "*deb" -type f|wc
80446 80446 4308241
grisu@auric:/org/ftp-master.debian.org/ftp$ find -name "*tar.gz" -type f|wc
7643 7643 414031
Gruss
Grisu
--
Michael Bramer - a Debian Linux Developer http://www.debian.org
PGP: finger grisu@db.debian.org -- Linux Sysadmin -- Use Debian Linux
"Like sex in high school, everyone's talking about Linux, but is anyone
doing it?" -- Computer Currents
Attachment:
pgps_Ok9VRn43.pgp
Description: PGP signature