[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: new proposal: Translating Debian packages' descriptions



Sorry to screw up the threading; thought I'd posted this already, then
deleted grisu's message before finding that I hadn't sent this :(

On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote:

> Not all parts are turned into stone. I need some comments and decision 
> on some parts. Maybe you can help.
> 
> One quote from a mail from Raphael Hertzog:
>  I find that having translations is far better that having not a
>  single one and refusing to add them because we can't have the perfect
>  solution right now.

However we do need to make sure that whatever we do right now can be
migrated fairly painlessly to the perfect solution, whatever that may be.

> 1.) use all the time _gettext_!

And don't forget that "the perfect solution" will involve translations
of all relevant text in a package, not just descriptions. I'm not saying
that anyone has forgotten this, just that a lot of the thought in these
threads seems to have been aimed at getting translated descriptions, and
that all would do well to remember the final aim at all times...


> 2.) get the .po/.mo files on the system
> 
>    If we will use gettext, we must get one .mo file on the system.

I'm not familiar with gettext, but I would suggest that this is incorrect.
I'd have thought that you would eventually need (at least) one .mo per
package, and several others (such as for package descriptions).

>    I propose the dir /usr/share/desc-trans/<locale>/desc-trans.d/ to
>    store all .po files. 

You've forgotten that in the end we'll not just be talking about
descriptions, haven't you?

>    If you make a apt-get update (or a other funktion like this in
>    deity and co), you have (maybe) new and changed description in the
>    apt database. And now you need a newer, better .po file. Because of
>    this, I propose to download the .po like file (see below) with apt
>    by the update process. 

Does the user actually ever need the .po? I thought you said that the
.mo was generated from the .po, and then the .mo is used.

>    What is the size of all this? Ok. we have now in sid/main/i386 (see
>    [2]) 7000 Packages and the descriptions of all this packages is
>    2660993 bytes big. We get a description size per package of 384 bytes.
>    With gzip we will get (maybe) 130 bytes. 

Whoa there. I guess this is a good a point as any for me to "go off on one".

This is not directed at this comment in particular, but many many of the
posts in these threads that I've been reading seem to be overly worried
about size. Stop and think about it. If you're going to have translations,
they will take up space, somewhere. That's just life.

Now, think about the structure of where they should/could go, and the
relationships between source, binary, and text data. Think databases.
Think normalization.

The text data in any one of an *arbitrary* number of languages is related
to the package, but you'd normally normalize it out into a separate
table in your database - you don't want to have your packages' source and
binary records growing to arbitrary sizes as arbitrary numbers of
translations are added to them.

So you probably don't usually want the translations to be part of the
package sources or binaries. They're logically separate, and should usually
be physically separate (as physically as we ever get in this sense).

Gettext abstracts the *idea* that is being communicated from the text used
to communicate it. That leaves the actual text used as an overlay, metadata.

So, we need to structure the repositories in such a way that the structure
of the data is respected. It also happens that this conveniently allows
for separation of areas of maintainer/translator expertise (and also
responsibility).

Packages as prepared by a maintainer need to contain text (.mo) for at
least one language; probably usually english, but once this works there'll
be no good reason for that to be the case. Translations of a package would
logically be in another file (we have .dsc, .deb, .tar.gz, .diff already
describing logically different aspects of a package, so there's no problem
adding .trans or similar). The exact best method to store these is open to
question, but I'd guess that it would be another section in the archive,
as sources and binaries are split now.

So, apt, for example would be told what to do with a line:

deb-trans http://www.debian.org/debian potato/de main contrib non-free


Which would be able to provide Packages files created from the various
translation packages. Multiple versions of the same package would be
dealt with in the same way as currently.

It also allows certain mirrors to provide certain sets of translations,
which will certainly be a Good Thing. And CD sets could easily include
one extra CD which provided the translation section of the archive for
whatever languages are required (OK, for the initial install there would
need to be a little more jiggery-pokery).


Exceptions and trickery needed:

  1) to ensure that versions provided within a package can take
     precedence over external ones, if so desired.
  2) to enable merging of external translation files into a single
     package (not so much for use in the main archive, but for Fred Bloggs
     to mail to his mates, for example).
  3) I'm sure there are more...


>    In a Packages file is not only the Description. You know, it
>    include all other tags from the control file. If we delete this
>    tags and put only the Description in one file and make
>    Descriptions-XX files, we save 50% of size. And if we save one
>    Description-XX file per dist and not per arch, we save more.

Packages files in the translation sections of the archive would only
need to contain language-independent information.

<troll?>
Although what happens when we need translated package names I'm not
sure. Actually, I don't think it'd be too difficult.
</troll?>

>    With this we need only 30 Descriptions files per languages [4].
>    This should only 14 MByte per languages (if all descripions are
>    translated). This files have only the package name and the
>    translated Description (and maybe the Version) in it. The APT
>    process can generate some .po files from the normal Packages file
>    and all downloades Descriptions files. 

This is not a good way to go; see above.

>    If we don't like this process on the client all the time, we can
>    produce Descriptions-XX.po files and the clinet must only download
>    this file and save this in the right dir. But this file will
>    include the orignal description and with this it has the double
>    size and download time.

Bad, translated packages file should be available in archive providing
translations, as above. This overrides basic untranslated descriptions etc.
where present.

>    With the Descriptions-XX[.po] file the admin must only download the
>    needed languages and not all languages.

This is an essential feature of *every step* of the translation process.


> 5.) translated descriptions in the package. 
> 
>    Now, this is the difficult part.
> 
>    We need a way to add the translated description in the normal
>    package. In the last mails, we see some proposals. 
> 
>    In privat packages or if the maintainer know some langauges and
>    make the translation hisself, it is a good way to include the
>    translation in the package. I'm not convinced that this is a ok in
>    the normal debian archiv. 

Agreed.

>    I see only one problem: the size. 

I see only one problem: it's just a horribly bad thing to do ;)

>    But how get the maintainer the translation? We have some cases:

Maintainers should *never* *need* the translation. Translations are logically
*entirely* separate from the package itself. So what the hell does the
maintainer need them for? Unless the maintainer originates them, they
don't need them. Anyone claiming that they're going to keep track of
goodness-knows-how-many translations, which may be updated in bits
here and there, is IMHO overly vain and talking bollocks. Maintainers might
want to be notified when translations for their packages are added to the
official debian archive (most likely only for certain languages), but I
reckon most will give up after the shower of mails they'd end up getting.

Using this system, the maintainer could have a veto over translations in
the official debian archive (if anyone bothered to set up a notification
mechanism), but not elsewhere (for example, there'd be nothing to stop
me setting up a "comedy" translation in english, taking the piss out of
various packages, or with all the normal text passed through one of the
"filters" package's filters, for use by anyone who felt like adding it
to their apt sources).

In fact I think the last example there proves that this is a Good Way of
doing it; unintentional but useful or humourous side-effects like that are
generally a good sign.


I haven't really gone into which of the currently proposed steps would
best lead into this kind of scenario in future, but I thought that could
wait until you've all agreed that this is a really cunning plan and
definitely the best way to do it ever. You know you want to ;)


Cheers,



Nick


P.S. I've been trying not to get involved in this thread in the hope that it
would come up with the Right Answer and go away, but...

-- 
Nick Phillips -- nwp@lemon-computing.com
If you sow your wild oats, hope for a crop failure.



Reply to: