Re: Work on a centralized infrastructure for i18n/l10n

To: debian-i18n@lists.debian.org
Subject: Re: Work on a centralized infrastructure for i18n/l10n
From: JC Helary <jch.helary@free.fr>
Date: Fri, 23 Dec 2005 13:46:04 +0900
Message-id: <[🔎] 29C56D2E-77A9-43FE-A979-E8C5A816AB26@free.fr>
In-reply-to: <[🔎] 20051222204645.GA30667@nekral.homelinux.net>
References: <[🔎] 20051221045047.GP10427@djedefre.onera> <[🔎] 20051221083027.GA6394@javifsp.no-ip.org> <[🔎] 20051221112815.GA32161@random.kti.ae.poznan.pl> <[🔎] 20051221141520.GA13261@javifsp.no-ip.org> <[🔎] 20051221172431.GB32161@random.kti.ae.poznan.pl> <[🔎] 20051221190239.GG5754@djedefre.onera> <[🔎] 20051222110437.GA6442@random.kti.ae.poznan.pl> <[🔎] 448DE84C-4B3A-4C81-B624-8AC36D4994DD@free.fr> <[🔎] 1135268170.6877.25.camel@localhost.localdomain> <[🔎] 83D05A76-FC01-414F-A636-EF7E5138F597@free.fr> <[🔎] 20051222204645.GA30667@nekral.homelinux.net>

I usually split the translation process as:

  string
extraction             translation
----------->          ------------>
             Database                Translator
<-----------          <------------
 displaying                tool
 translated
  strings


Sure, everybody does.

* Some string extraction tools exist: xgettext, poxml, po4a,xml2po, some
   tools generating XLIFF


Fair enough. Any textual data can be converted to any kind of format.

 * The database can be a PO file, an XLIFF file, or (why not) another
   database.


Definitely, any placeholder with a structure is good.

 * The translation tool is a tool able to deal with the format of the
database and help the translator (merging old translations,displaying
   the strings that need to be updated, etc.)


Sounds like a lecture for freshmen CAT tool users but go ahead.

* Displaying the translated strings is done by gettext or bygeneratingthe documentation (this is usually done by the tool used for thestring
   extraction).


Oui...

I don't exclude the manual translation: the extraction is done by an
human, who stores an original string in her brain, translate it andwrite
it in the translated document.

...

Thus IMO, advantages of the XLIFF format should be demonstrated by
considering XLIFF as the database.

I won't consider having good translation tools or string extractiontools

that deal with XLIFF files as an advantage of the XLIFF format.

Well, then, if you won't why bother going that far trying todemonstrate that anything is just as good as anything else ?

My prefered translation tool is vi. Thus, as a translator, I preferPO to
XLIFF or a mysql database.

And here lays the problem: you consider vi as a translation tool. Nowtell me, how many people who do translations on a daily basis wouldconsider vi as a translation tool ?

I think you are mixing 2 things here: how to manage the backend, andwhat format to propose to end translators. The back end could bemanaged using any format, even a format developed only for Debian.There are plenty of ways to do that and I don't question the Debianway (or what is going to be the Debian way).

What I am saying (maybe on my 3rd or 4th mail) is that opening thetranslation process to people outside the GNU/Linux world could bethe result of adopting a translation industry standard that benefitsthe end translator because more tools exist with more options on moreplatforms with less nerdness to overcome before actually starting totranslate. And since that end user format _happens_ to be handled asa TM management format as well (and can easily be transformed to theTM "exchange" format that is TMX also an industry standard) why notuse that format as the storage format. That would save transformations.

It seems you like to have sentences separated. This is not related to
using XLIFF. This is a string extraction issue.
Note however that most of the complaints I receive for po4a (about its
strings extraction features) are that there is too few context in the

strings proposed to the translators, not that paragraphs should besplit

in sentences.

Now you are talking about another problem: adequation of the formatto the task. And I think your translators are very much aware ofthat: sentence translation is not appropriately handled by po basedtools. Just like you mention in your other mail: multilingual bodiesare not either properly handled by po based tools.

Context was an issue in the translation world before po ever existedand before computer guy realized there was a need for localizingtheir strings. And the proof that they did not quite get it the rightway is the character set issues that eventually are starting to getsolved with Unicode being pretty much generally accepted.

Now it happens that quite a number of translation based groups (usingcomputers and not the other way round) have analysed this problem andhave come up with a number of reasonable standards that are alsostarting to get common acceptance: xliff for the translation itself,tmx for its exchange, tbx for glossaries, srx for segmentation (notlimited to "sentence" segmenting). The standards work very welltogether and offer a stable common ground on which back ends arecreated, tools are developped, translations are accomplished etc.

And, of course, all the above processes include from the earliestpremisses what it took the computer guys so long to figure out,because the computer guys are not essentially aware of localizationand translation issues, mostly because they don't need to be. Andthat is fair enough.

It seems you prefer XLIFF for HTML translations. I don't know why,but it isprobably not related to the XLIFF format. Maybe it is just that thetool
you use with your XLIFF files is better than what a PO tool would have
done (at what step: string extraction, translation?).

No I don't prefer xliff for html transformations, I don't transform,I translate.I understand that gettext and po and all the related tralala doeswhat it is requested to do: extract text from source, put that in anice package for display in a string editor and insert back thefinished product.

Now my idea, but I may have gotten that wrong, it that all this wasdeveloped in a very specialized context of _localicalising certainapplications_ and not of translating. Localization and translationare two different processes and although they overlap they are notequivalent. po comes from a specific localisation context andextending it forever to the complexity of the current documentformats and workflows seems like a mistake to me since it is notdesigned for that in the first place. Cf the above 2 examples(segmentation and multilinguism). While the translation world hasproduced standards that also apply to localization processes.

One of the advantages of XLIFF could be for storing the oldoriginal and
translated strings (this is convenient to check what changed in the
original string; i.e. was the original string updated just to fix atypo,or did the meaning changed?). With a PO, a versionning system canhelp to
check what changed; but this doesn't always work smoothly (e.g. when a
string move inside a PO).


Because it is not designed for that.

This is not directly an advantage for the translator.


??? Oh really ?

It is just something
that can help the translation tool to propose another feature.


Propose to whom ? Somebody else but the translator ?

With a PO, this could be done by providing the old and new PO to atool (I would
love to have such a tool).

Well, that already exists with tools that support a standard that'srelevant to translation, which is not the case for po as you say.

One feature I don't like in XLIFF is having multiple languages inthe sameXLIFF file. This was proved wrong with the debconf translations(multiple
translation updates can't be committed).

1) a xliff file is not required to have more than 2 translation unitvariants.2) if you use formats that are not designed to support multiple tuvchances that the result will not be satisfying.

If you claim that XLIFF translation tools are better integrated to
translation memories, it is "just" a translation tool issue. And a PO
translation tool could also use TMX, or XLIFF memories could beconverted
to gettext compendia.

You are saying po "could" be used for things it is not designed for.I am saying standards exist, designed for that specific task andtools exist that do not require to accept nerdiness on the end user(translator) side.

Then if XLIFF based tools are really much better than PO based tools,
maybe these tools should be used (PO could be created temporarily if
needed). The links you provided did not convinced me.

The tools are not "much better", the tools are designed to support aformat that is designed for the tasks needed in the Debian i18nframework. If you want to write something from scratch using formatsthat are not designed for the job you are bound to face a number ofdifficulties. You already described a number of po limitations, andthose are limitations not because of the tools, but because po is notthe proper format to accomplish the task.

Now you may always argue that since po has been around for a while,it is always better to adapt the existing framework, and I'dunderstand this conservative approach, which is very Debianese, Ireally have nothing against that. But in the end, if it is aboutcreating from scratch a framework that deals with all sorts oftranslation centered issues, it is better to use a format that hasbeen designed from scratch to deal with those issues.

The 2 links were not meant to "convince" anybody, but to give aglimpse of 1) how xliff can be made to fit a po centered system, witha transition to xliff in mind, and how xliff is used in a broaderperspective.

There are plenty of people who have extensively written onlocalization issues (see the ibm developer site, oasis, lisa) and Ican assure you that they have plenty of good reasons to use suchstandards and not po.


Joyeux Noël anyway :)

Jean-Christophe Helary

Reply to:

Follow-Ups:
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Christian Perrier <bubulle@debian.org>
- XLIFF tools (was: Work on a centralized infrastructure for i18n/l10n)
  - From: Nicolas François <nicolas.francois@centraliens.net>

References:
- Work on a centralized infrastructure for i18n/l10n
  - From: Christian Perrier <bubulle@debian.org>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Javier Fernández-Sanguino Peña <jfs@computer.org>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Thomas Huriaux <thomas.huriaux@gmail.com>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Javier Fernández-Sanguino Peña <jfs@computer.org>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Thomas Huriaux <thomas.huriaux@gmail.com>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Christian Perrier <bubulle@debian.org>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Thomas Huriaux <thomas.huriaux@gmail.com>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: JC Helary <jch.helary@free.fr>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Stefano Canepa <sc@linux.it>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: JC Helary <jch.helary@free.fr>
- Re: Work on a centralized infrastructure for i18n/l10n
  - From: Nicolas François <nicolas.francois@centraliens.net>

Prev by Date: Re: Work on a centralized infrastructure for i18n/l10n
Next by Date: Re: Notes on quotation marks
Previous by thread: Re: Work on a centralized infrastructure for i18n/l10n
Next by thread: Re: Work on a centralized infrastructure for i18n/l10n
Index(es):
- Date
- Thread