[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

new proposal: Translating Debian packages' descriptions



Hello all


After I read some more mails and write some comments myself, IMHO it
is time to write a newer hopefully better proposal. Not all is new.
But I add some new thoughs and some parts from some comments.

In this proposal I have combined the decentralized translations, and
also the central repository. And this all without a delay in the
translator to user path. 

Not all parts are turned into stone. I need some comments and decision 
on some parts. Maybe you can help.

One quote from a mail from Raphael Hertzog:
 I find that having translations is far better that having not a
 single one and refusing to add them because we can't have the perfect
 solution right now.



	     Add Translations of the Package Description 
		      in the Debian Distribution

		(c) Michael Bramer <grisu@debian.org>


1.) use all the time _gettext_!

   All know gettext and all use this. Why should we use gettext to add
   the translated description in the debian describution? Because of
   this. Gettext is *the* technic for translations. 

   All know it, you need not teach a maintainer, you need not teach a
   user (a important point). If a user already use a system with
   locale enviroment, he just will have translated descriptions in
   future. 

   gettext make all the work and gettext is tested (and is useing in many
   programes). With this you need only some little pachtes. (We show a
   -9/+30 patch for dselect/dpkg and hopefully a apt patch will it not
   much bigger.) Gettext show never outdated translation (a big point)
   and have other nice features (see below).

   Maybe the release manager will allowing this patch in woody, but
   this is a other story.

   If apt and dpkg is patched and the user have a nice .mo file in
   /usr/share/desc-trans/<locale>/ all output of _all_ package
   management programs is transled. (dpkg and APT use a patch, other
   programs (like deity, etc.) use APT)

   gettext support already fallback languages. See [1] for more
   informations. If I understand the gettext source code in the right
   way, the fallback is per message and not per .mo file. With this
   someone can set LANGUAGE=hu:sl:cz and get a
   hungarian->slovak->czech->->english fallback path. (If a
   description is translated in slovak but not in hungarian, the user
   will see the slovak description.)


   This is all nice, and we have only one problem. How will the user
   get a nice .mo file? 

   First on comment on this question: You have this problem all the
     time with the description. You must download the
     descriptions and the translations first. Only and after this, you can
     use (see) it and install the real programs/packages. 
     With the normal (english) Descriptions we use the Packages files
     (with apt or dselect (the old methodes)) We must use somethink
     like this with the translations too...


2.) get the .po/.mo files on the system

   If we will use gettext, we must get one .mo file on the system.
   The .mo file is generatted from a .po file and it is itself a binary
   data file. If you have some sources (like ftp.debian.de and a
   local mirror with own packages) you will have some translations and
   some .mo/.po files.

   The best way is, that you download the .po files, merge this files
   with a tool and make from this one big .po file a .mo file and use
   this file. (maybe you must only make a 'cat *.po > master.po', I have
   not test this now, but this is only a technical question and
   problem)

   I propose the dir /usr/share/desc-trans/<locale>/desc-trans.d/ to
   store all .po files. 
   
   If you make a apt-get update (or a other funktion like this in
   deity and co), you have (maybe) new and changed description in the
   apt database. And now you need a newer, better .po file. Because of
   this, I propose to download the .po like file (see below) with apt
   by the update process. 

   What is the size of all this? Ok. we have now in sid/main/i386 (see
   [2]) 7000 Packages and the descriptions of all this packages is
   2660993 bytes big. We get a description size per package of 384 bytes.
   With gzip we will get (maybe) 130 bytes. 
 
   With this the size on the system is like the Package files from
   apt. If you have some sources you will have some (5-20) Megabytes in
   /usr/share/desc-trans/<locale>/desc-trans.d/ and a collect .mo file
   per language.

   But the admin of the system must pay this price, if he will see translated
   descriptions. (and it don't care if we use gettext or a other
   technic, with gettext we have only the extra .mo file.)

   But what file should apt download? The first thought is maybe a
   translated Packages-XX file. But the first thought is not the
   best way all the time.

   We have _now_ 316 Packages* (see [3]) files on ftp-master with 141
   MByte of size. If we translate this all in (only) 10 languages we
   need 1,4 GByte. With more Packages and more Languages more and
   more. Ok, harddisk are cheap, but not free. This is not the right
   way.

   In a Packages file is not only the Description. You know, it
   include all other tags from the control file. If we delete this
   tags and put only the Description in one file and make
   Descriptions-XX files, we save 50% of size. And if we save one
   Description-XX file per dist and not per arch, we save more.

   With this we need only 30 Descriptions files per languages [4].
   This should only 14 MByte per languages (if all descripions are
   translated). This files have only the package name and the
   translated Description (and maybe the Version) in it. The APT
   process can generate some .po files from the normal Packages file
   and all downloades Descriptions files. 

   If we don't like this process on the client all the time, we can
   produce Descriptions-XX.po files and the clinet must only download
   this file and save this in the right dir. But this file will
   include the orignal description and with this it has the double
   size and download time.

   With the Descriptions-XX[.po] file the admin must only download the
   needed languages and not all languages.

   As the first step (and litle hack), we can produce
   desc-trans-XX.deb with only the .po file. A user can download this
   file, install it, and have translated description. If we have patch
   katie etc. and we have the Descriptions files in all the mirrors,
   we don't need this deb and can remove this from the archiv. 


4.) How get katie (or the desc-trans-XX.deb) the translation?

   Katie get the translation from the deb package itself (see next
   point) or from a override file as fallback. The ddts (Debian Description
   Translation Server) can produce the override file. Normal the
   translator get the untranslated description from this server and
   send the translation to this server. The server make the whole
   work. If a description changed, it send mails to the translator of
   this translation, send new descritions to the translator and send
   notifications to the maintainer.

   The maintainer has a veto and can remove a translation from the
   ddts db. He can send improvements to the translator, etc. He is not
   out of the loop. He only outsource the translations to the ddtp.

   If a maintainer don't like the ddtp, he can translated the
   description hisself, find own translators etc. This is not a real
   problem. The ddtp is only a service for the maintainer and prevent
   work on this site.


5.) translated descriptions in the package. 

   Now, this is the difficult part.

   We need a way to add the translated description in the normal
   package. In the last mails, we see some proposals. 

   In privat packages or if the maintainer know some langauges and
   make the translation hisself, it is a good way to include the
   translation in the package. I'm not convinced that this is a ok in
   the normal debian archiv. 

   I see only one problem: the size. 

   We have now 80446 .deb packages and 7643 source packages in the
   debian archiv on ftp-master. If we include the translation in the
   deb, we must store this in the source and in every deb package. 

   check this calculation:
     If in all sources are only one desription with 130 (geziped)
     bytes of description we get 1 MByte per languages. If we use po
     files in the source (see below), we get 2 MBytes per languages
     And all deb packages have only one description with 130 (geziped)
     bytes. This make 10 MByte per languages. If we store the
     description as po file, we will use 20 MByte per languges. 
     11/22 MByte per languages, with only 10 languages we will get
     110/220 MBytes. 

   With more Packages, ports, languages, this will grow. This bytes
   must all be downloaded, uploaded and synced with the time.

   And on the local system the descriptions and the translations of
   all languages from the package will stored on the local harddisk
   (without gzip). Count:
     With 10 languages, 1000 installed Packages and 380 Bytes per
     description and per translation you get additional 4/8MBytes on
     the local disk.

   Is this all usefull in a 'normal' deb package from the debian
   project? Maybe yes. We must decide this. (I personal don't find the
   real pro about this. But we can add it and I don't have a real
   problem with this. I see only the size problem, and this is not a
   big problem.)

   In all the cases I propose: store the description in the source as
     .po file in the /debian/ dir (one per languages). This is the
     only real good way to store the translations. (no encodeing
     problem, no outdated text, no debconf-mergetemplate hack, ...)

   But how get the maintainer the translation? We have some cases:
    - The maintainer translate the description hisself
    - He find some own translator (like now with debconf)
    - He use the ddtp
      - He can ask the ddts and get all translations of the package
      - He can use the override file of katie
      - He use the notification mails from the ddts (In future the
	server will use the decided format in this mail. With this,
	the maintaner must only copy this file in the source.)
    
   Now the technique part:

   The proposal with the biggest patch, is the 'put the translation in
   a own element in the deb ar'. Maybe this is nice and feasible. 
   But this is not a fast way. 

   Because of this I propose some solutions:

   1.) (very fast)

     put the translation as normal .po file in the
     /usr/share/desc-trans/<locale>/desc-trans.d/ dir. finish. 

     This don't need some extra work on dpkg etc.

   2.)

     Put the translation in the control.tar.gz of the deb. Maybe as 
     desc-trans.tar.gz with all translation. 

     We can put this as real po file or as description file (without
     orignal description). dpkg --info can use this and show all
     included and translated descriptions. 

     If the package only include the translated description (and no po
     file), a gettext like process must assure, that no outdates
     translation will include in the package!

     While the package installation dpkg should move this files to the
     /usr/share/desc-trans/<locale>/desc-trans.d/ dir. (If the
     translation is not in the po file format, dpkg generate a po file
     from the translation and the orignal description)

   3.) (the long way, if possible)

     Add the desc-trans.tar.gz in the deb ar as a own new element. The
     other points are like 2.).

     But this has the big feature, that some process on ftp-master can
     edit the .deb on the fly and change and/or add some translations. 

     Maybe this has some other problems. 

   All the time we should use a dh_*-script. With this we can start
   with 1.) and can switch to 2.) or 3.) later. And maybe this script
   can get the translation from some source itself.


6.) Transition to a debian with translations

  - We have the first translations and the first step is a newer,
    patched dpkg and apt. 

    Please can we have the opinion of Wichert and Jason for dpkg and
    apt about the use of gettext for the translation of the
    description?!

  - The next step is a decision of the format in the deb file.

  - The last step is the download of the translated description with
    apt by the update process and the patch of katie to produce the
    Description or Description.po files.

  Maybe we get the first step with woody and the others with woody+1. 


Appendix

[1] from the ABOUT-NLS from gettext source:
  ...
     Not all programs have translations for all languages.  By default, an
  English message is shown in place of a nonexistent translation.  If you
  understand other languages, you can set up a priority list of languages.
  This is done through a different environment variable, called
  `LANGUAGE'.  GNU `gettext' gives preference to `LANGUAGE' over `LANG'
  for the purpose of message handling, but you still need to have `LANG'
  set to the primary language; this is required by other parts of the
  system libraries.  For example, some Swedish users who would rather
  read translations in German than English for when Swedish is not
  available, set `LANGUAGE' to `sv:de' while leaving `LANG' to `sv_SE'.
  
     In the `LANGUAGE' environment variable, but not in the `LANG'
  environment variable, `LL_CC' combinations can be abbreviated as `LL'
  to denote the language's main dialect.  For example, `de' is equivalent
  to `de_DE' (German as spoken in Germany), and `pt' to `pt_PT'
  (Portuguese as spoken in Portugal) in this context.
  ...

[2]
  grisu@auric:/org/ftp-master.debian.org/ftp/dists/sid/main/binary-i386$ grep-available -s Description "" Packages|grep ^Descrip|wc
     6922   48709  372777
  grisu@auric:/org/ftp-master.debian.org/ftp/dists/sid/main/binary-i386$ grep-available -s Description "" Packages|wc  
    50806  406596 2660993
  $ bc -l
  2660993/6922
  384.42545507078878936723

[3]
  grisu@auric:/org/ftp-master.debian.org/ftp$ find -name "Package*"|wc  
      316     316   15774
  grisu@auric:/org/ftp-master.debian.org/ftp$ find -name "Package*"|xargs cat|wc 
  3266668 14826546 148475135

[4]
   unstable/main
           /contrib
           /non-free
   frozen/main
         /contrib
         /non-free
   frozen-proposed-updates/main
                          /contrib
                          /non-free
   stable/main
         /contrib
         /non-free
   stable-proposed-updates/main
                          /contrib
                          /non-free

[5]
  grisu@auric:/org/ftp-master.debian.org/ftp$ find -name "*deb" -type f|wc
    80446   80446 4308241
  grisu@auric:/org/ftp-master.debian.org/ftp$ find -name "*tar.gz" -type f|wc
     7643    7643  414031


Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger grisu@db.debian.org  -- Linux Sysadmin   -- Use Debian Linux
"Like sex in high school, everyone's talking about Linux, but is anyone 
 doing it?"  -- Computer Currents

Attachment: pgpvkrpahj48z.pgp
Description: PGP signature


Reply to: