[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: advice on po/pot/po4a layout etc.

Hello Ian,
answering at least partially from a translators and package
maintainers view, who also i18n his programm (albeit on a much smaller

On Wed, Sep 12, 2018 at 09:48:56PM +0100, Ian Jackson wrote:
> Hi.  I hope this is a suitable list for my question.  If not, please
> direct me elsewhere...

At least some experts are on this list, so I belive it is a good

> I am doing the i18n for a package (src:dgit) which I think it will be
> useful to translate (at least, much of it).  It's a Debian native
> package containing mostly perl scripts.
> I'm not sure of the best approach.  My main questions:
> 1. There doesn't seem to be any standard set of Makefile machiner to
> include, or anything.  Do I really have to write my own make rules to
> run xgettext etc. ?  I looked at the debconf source package, which
> seemed like it would be a good example, and it had its own rules.  I
> can write my own rules if that is best; they're not huge.  It just
> seemed a bit wheel-reinventish.

You might have a look at dpkg / apt as well, but I agree that there
does not seem to be a plug in ready to use make file. But on the other
hand I did not find po4a overly complicated and looking at good
examples made me write the necessary files rather quickly, especially
using the man pages of po4a.

> (NB that I don't want to instroduce use of automake into what is
> currently a simple "upstream" Makefile; if it comes to that I would
> prefer just to write my own rules for this.)

My most work come from exactly this, so your use case appears even

> 2. I am unsure of the best layout of the .pot, .po, po4a, etc., files.
> The convention I saw in src:debconf was to have a directory `po'
> containing a single `debcconf.pot', all the message translations
> LANG.po, and the corresponding Makefile and script machinery.  I
> dislike the idea of mixing up files edited by translators with make
> machinery, but I can tolerate it if it's conventional and would
> disturb people if I did it differently.

This is the standard layout. If a new translators picks appears (s)he
will look for the pot file, copy it over to, say de.po, and start
working on it. (I'm not an expert on tools like weblate, but they
probably do similar things for the online interfaces).

> In src:debconf I also saw po4a in use.  The translations were all in
> doc/man/po4a/po/LANG.po, and there was also
> doc/man/po4a/add_LANG/addendum.man.LANG.  This all seemed a bit ad
> hoc.

> Is there a standard layout ?  See also my next question, which may
> influence the answer to this one.

This is the way it is designed, but you can do it differently, e.g.
dpkg has a single man/po directory with all .po and .add files in it.

> Relatedly, how do automatic translation coversge tools (we have those
> I think?) deal with the variety of different possible layouts ?

Hopefully people with background on our i18n machinery can answer this
in more detail.

> 3. I am not sure how to divide up my translation inputs (pot files).
> My single source package generates two binary packages.  The two
> binary packages are rather different; they perform different roles
> (although they work well together) and have different (but
> overlapping) audiences.
> This might reasonably influence the way the messages from the two
> packages (really, the two programs) are translated.  So maybe I should
> have two .pot files for the two sets of messages.
> But the programs share a small set of library code.  The library code
> does not have many messages, but there are some.  These messages
> should be translated only once.  So if I split it up, there would have
> to be *three* .pot files for messages: dgit, git-debrebase and common.

From a translators POV try to avoid too many pot files. Usually
translation teams are understaffed, so if you really must split the
files, do so by importance and label the "level 1", "level 2" or "prio
1" and "prio 2". This will guide translators. But if you want
consistency, have a few (or even a single file) might be best.

> (I think I can use tools like xgettext and msgcat, with appropriate
> make runes, to handle any arbitrary organisation of .pot files that I
> decide on.)


> The need for splitting up is perhaps more acute for the documentation.
> I will use po4a for that.  (po4a has a powerful system for handling
> almost arbitrarily strange layouts.)
> The git-debrebse package has its own data model and conceptual model,
> and its documentation is carefully written to talk about that in the
> right terms.  Additionally, perhaps it is useful for a translator to
> know whether a string they are translating is part of a reference
> manual or a tutorial.

Giving context information to translators is always good. You can add
annotations via po4a, so guiding translators is appreciated.

> But src:debconf does not split like this so maybe it is not useful ?
> Or maybe it is even harmful because it might involve duplicating
> certain "framework" parts or something ?

Try to avoid duplicating strings, this is really bad for translators.
Splitting translations might make sense, as it is usually much larger.
If a translator encounters a 100 string file for an important tool,
(s)he might start and finish much quicker (including review on the
translation list) than say for a 500 string file. 

> 4. Terminology in translations.
> As I say, one of the two packages has a specific conceptual model.
> Yhat has its own terminology, which is defined in a section 5 manpage.
> It is important that if and when this is translated, thought is given
> to what translated names to give for each of the English terms; and
> that this settled terminology is then used consistenty throughout all
> of the documentation.
> Also, the terminology appears, in some cases, as protocol elements
> (which are in text and amy be displayed to the user).  These obviously
> cannot be translated or things will break.  So I think, ideally, when

Add hints to translators, what to translate. Please note, however, if
your programm gets i18n, some strings might get translated, e.g. if
you query a user with a yes/no question in this case the answer might
be in the language of the user.

> the terms are defined in the section 5 manpage, the English words
> should be stated alongside the translated ones.

I like the idea.

> Can I (should I) leave a note to translators about these issues ?
> The relevant documents are in perl pod format.

Yes, please do. 

> 5. Translation priority
> Obviously translators are volunteers and will work on what they feel
> is most important.  But I think some parts are much more important to
> translate than others:
> These tools, particularly dgit, are useful within Debian but also,
> IMO, extremely useful outside it.  Different people will use it in
> different ways.
> This is reflected in the documentation.  Some of the documentation is
> aimed at users and downstreams; whereas some is aimed primarily at
> Debian maintainers for whom it is less important to have translations
> since much of the rest of their work has to be done in English.
> Is there a sensible way to inform translators about this kind of
> thing, so that they can spend their time wisely ?  I think maybe I
> would like to tag some documents as high, medium, or low priority, or
> something.

If you annote the initial strings with this information ("the target
audience of this is a Debian developer / a random user") then I belive
translation teams can and will handle the priority by themselfs. But
note that some translator simply like the programm and will translate
everything irrespective of priority.

> 6. Committing the .pot file
> AFAICT it is conventional for the .pot file(s), generated
> automatically from the source code with xgettext, to included in
> source packages, git repos, etc.
> That seems odd.  What is the reason for this ?  Can I sensibly diverge
> from this and expect translators etc. to run a build rune to get the
> .pot files ?

Please don't. Translators are not developers. They usually will not
run any build tools. Also online programs, l10n monitor scripts etc.
will usually not work without the pot file. Simply update the pot file
whenever your strings change, the rest is covered in the traditional
translation machinery.

This might not be the best way, but this is the working way.

> I was surprised not to find answers to my questions in the
> documentation for gettext, etc.  Am I missing some best practice
> guide ?
> All advice and opinions gratefully appreciated.

I hope the answers help you. If you need more information, do not
hesitate to ask.


      Dr. Helge Kreutzmann                     debian@helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/

Attachment: signature.asc
Description: Digital signature

Reply to: