[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[l10n] Towards a central handling of D-I translations



Hola folks,

While roasting my laptop under Andalucia's sun, I had time for more
deep thinking about the future handling of D-I i18n/l10n
(globalisation==g17n) as we have now reached a major milestone with
the release of RC1, which hopefully you got out while I was away....

D-I globalisation has grown a lot in the last two years and became
somewhat complicated to handle for translators. Moreover, in the same
time, the number of supported languages grown : the early translators
were most often long-time Debian contributors while some of the recent
translators are not.

This is likely to continue in the future if I achieve my goal of
grabbing more and more translators here and there : the more language
we support, the less translators with deep Debian knowledge we will
have.

So, we need to simplify the g17n infrastructure as much as possible,
from the translators point of view.

After lot of thinking, I have identified two major things to change in
our infrastructure:

1) the translations of D-I package split over dozens of files, one per
   package
2) the weaknesses of the current "3 stages" system for following
   translation progress

1) Too much files
-----------------

What we call in translators jargon the "first stage" consists of all
D-I team maintained packages, thus the packages in the D-I SVN
repository.

Translators currently work on these files one by one and commit them
individually when they are changed.

This requires them to either keep a full copy of the D-I source tree or
to grab files from statistics pages.

This also complicates the work of teams using their own CVS tree such
as Arabeyes translators, Norwegian translators from Skolelinux or
Russian translators.

This also forces translators to repetitively translate the same
sentence over several files which most often ends in inconsistent
translations from package to package.

Finally, this eats a lot of time when translation commits have to be
done by myself, as a help to translators who cannot commit files
themselves : most often their files are named according to the package
they belong to and reputting all these files at the appropriate place
is very tedious

As a conclusion, the use of a single PO file per language would be a
great improvement for all translators.

A few tools are already available for working this way : I have
commited two prospective scripts in scripts/l10n-utilities before I
discovered Petter Reinholdtsen's gettext-helper script which more or
less does the needed job and is used by the Norwegian team.

So, I have dropped the following plan for transition to a single PO
files for all D-I packages translations:

0) write a script for merging all existing PO files to one single file
   per language in packages/po. This script has already been written
   by Petter : gettext-helper
1) write a script for collecting all templates strings in packages and
   create a general template file in packages/po,
   merge it with the existing single translation files in packages/po,
   re-spread out translations from this file to all packages debian/po
   directories
2) set up this script for running periodically under my account
   on people.debian.org
3) switch the French translations to this new scheme
4) test...test...test
5) progressively switch other languages to this new scheme

The new script will be named l10n-sync and has been (or will soon be,
depending how this mail goes out) commited to
scripts/l10n-utilities. Its logic is described in one of the new
documentation files I have also commited in installer/doc/i18n (the
file is "technical.txt").

At the end of the migration, translators will only need to work on
files in packages/po and will just forget about all other files.

Developers will no more need to care about debconf-updatepo and such
other stuff when they change or add strings to templates.

The l10n-sync script will handle all the magic for updating PO files
in debian/po as well as syncing them with translators work from
packages/po.

Another script will be available to developers so that they can
manually sync the PO files for one single package, usually before
releasing the package.

The logic for the handling of debian/changelog files content will not
be changed : the scripts/l10n-changes/output-l10n-changes will still
be usable. It will be called from a more general script, designed for
package maintainers who wish to update their package's translations
immediately.

I plan to install the l10n-sync script while I'm still on holidays
from Aug 16th to Aug 22th, with lot of time for closely follow its
work while I switch the French translations to this new scheme.

During the Aug 23-Aug 30 week, I'll try to get a few more languages
handled by the l10n-sync script : most probably the languages handled
by well skilled translators who will be able to handle possible
messes..:-)

In the same time, I will work with Dennis Stampfer on adapting the
translation statistics infrastucture to this new work method (see the
second part of this mail).

Finally, during September, all languages will be switched to the new
system.

If this systemappear to work well, the switch may happen earlier,
depending on the next releases process.

2) Weaknesses of the "3 stage" system
-------------------------------------
During the last months, we invented the "three stage" system for
following translation statistics.

This is due to the fact that having a fully translated installation
process does not only need translating D-I packages
themselves. Several "regular" Debian packages are involved in the
installation process, most of them needing translation of the screen(s)
they may show to users.

Some of these packages are maintained by the D-I team or by regular
D-I contributors (base-config, tasksel, popcon...), some others
aren't.

So, with Dennis Stampfer (and Denis Barbier previously), we grouped
together, in statistics pages, the translations statistics:

-1st stage : all things shown during the first step of the
             installation, before the reboot
-2nd stage : all things shown or possibly shown after the reboot
             involving some user input
-3rd stage : all things shown or possibly shown to users during
             the installation, not involving user input

This induced some progressivity to translators work as obviously
translators needed to complete 1st stage before 2nd stage and then 3rd
stage.

However, this scheme has currently several weaknesses:

1) the name of "stage" is wrongly chosen. There aren't 3 stages during
   the installation process. This name is iniherited from times we
   talked only about 2 stages, with stage2 only including base-config
   and tasksel

2) for several reasons, we have put in 2nd stage things which indeed 
   pertain to 1st stage : this is the case for iso-codes translations
   (country names) which are shown by countrychooser, but are
   currently counted in "2nd stage"
   As a consequence, some languages for which we claim to have 100% 
   translation may still show English in countrychooser's screens

3) we currently do no take into account the status of "2nd stage"
   translations when publishing our translation statistics. Some
   languages for which we claim 100% translation do not have
   translation for base-config or shadow screens, for instance
   This may confuse our users who expect their language, but will
   get English in some of second stage steps. We already had reports
   about this.

4) Second stage currently includes very different things : some
   packages for which translations are nearly mandatory such as
   base-config, shadow debconf or tasksel and some things which have
   no real consequence on what is shown to users (pppconfig which is
   very rarely used, shadow programs translations which are not used 
   at all...)

5) Translators have sometimes few indications about which package
   or which package part should be translated first : for instance,
   several of them have spent hours translating shadow programs while
   tasksel or iso-codes remained untranslated

6) Statistics are currently made on most packages CVS or SVN
   repositories. This is good for giving transaltors a good idea of
   which work they still have to do. However, this may give a false idea
   of the real translation status, if some commited translations have not
   reached the archive yet.

As a consequence, the real translation status for each language is
sometimes difficult to really appreciate, most often because of the
mix between 1st and 2nd "stages" translations.

As a conclusion, we need to re-arrange the way we currently build our
statistics so that they better reflect the real translation status
which will be seen by our users. We also need to be able to say how
many translations are complete for the whole installation process in
addition to the statistics we currently publish for the "core" Debian
Installer.

For this, my plan is the following:

1) Rename "stages" to "levels"

2-5) reorganise things between levels so that they better reflect the 
     progressive translation process

6) Except for first level, give two statistics : the status of
   translated/commited material as well as the status of translation
   in the Debian archive
 
I have chosen to mention 7 levels in translation status. First of all,
this is a number with some high symbolic meaning. I have probably been
influenced by my holidays in a place where the three monotheistic
religions peacefully coexisted for hundreds of years...:-)
 
These 7 levels will be the following:

level 1 : all core D-I packages
          1180 strings
level 2 : all non core D-I material involved for user interaction
          screens during a *default priority* installation of
          a Debian base system with default choices:
          - base-config (programs and debconf) :  7+112 =119 strings
          - shadow (debconf)                   :          25 strings
          - tasksel (programs, debconf, tasks) : 2+5+102=109 strings
          - iso-codes (iso_3166)               :         404 strings
          - console-data (debconf)             :          89 strings
          - exim4 (debconf)                    :          63 strings
          - popularity-contest (debconf)       :           7 strings
          816 strings
level 3 : all non-core D-I material involved for user interaction
          screens during any type of installation of a Debian base
          system. This will include rarely used packages and packages
          which may display their screens under certain circumstances:
          - discover1 (debconf)                :          11 strings
          - aptitude (programs)                :         723 strings
          - pppconfig (programs)               :         135 strings
          - console-common (debconf)           :          26 strings
          - dictionaries-common (debconf)      :          28 strings
          - pcmcia-cs (debconf)                :          30 strings
          953 strings
level 4 : all packages which may display messages to the screen
          during any type of installation of a Debian base system:
         - discover1 (program)                 :          83 strings
         - dpkg (program)                      :        1006 strings  
         - apt (program)                       :         459 strings
         - shadow (program)                    :         464 strings
         2012 strings
level 5 : all Debian base system packages (debconf+programs)
level 6 : all Debian packages of priority Standard (debconf+programs)
level 7 : all other Debian packages (debconf+programs)

Obviously, levels 5 to 7 are currently very fuzzy....while level 7 is
completely unreachable (let's see it as a kind of translators Grail...)

So, all this means splitting out the translation statistics in four
real levels for a total of nearly 5000 strings.

This also means that after the split, we will be able to publish the
statistics for the first two levels and these will give a real idea of
the status of the translations for the whole installation process. We
may even imagine publishing the statistics for the 4 levels though
this may be a bit confusing.

This will also give a clearer credit to translators and translation
teams who currently have a real complete Debian installation
          
During the next weeks, I will work together with Dennis Stampfer on
building a new translation statistics web site with the new
scheme. This will be a bit tricky as some packages such as shadow has
some of their i18n material in one level and another in another level.
The double statistics may also be a bit tricky.
 
These changes will occur while the old system will continue
working. They will be made in parallel with changes to the core D-I
packages translation system and they will be tested on a few languages
first.

New documentation
-----------------

I have already commited  new documentation for the translation
process, which takes into account this new scheme. The new document is
in installer/doc/i18n. This is a XML document named "i18n.xml". A very
small build script is provided for compiling it to an HTML
document. I'm just learning about DocBook, XSL stuff and I will
probably soon provide some better build script for building a text
file as well.

The new document includes parts for translators as well as parts for
maintainers. All D-I contributors, and more particularly translators,
are invited to read it carefully.

This is a very long and detailed document but I have tried to put
there as much information as possible.

The document is based on the old "translation.txt" file for which
references should now be gradually removed on D-I Web site(s) and
documentations.

-- 







Reply to: