[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Work on a centralized infrastructure for i18n/l10n



I usually split the translation process as:

  string
extraction             translation
----------->          ------------>
             Database                Translator
<-----------          <------------
 displaying                tool
 translated
  strings

Sure, everybody does.

* Some string extraction tools exist: xgettext, poxml, po4a, xml2po, some
   tools generating XLIFF

Fair enough. Any textual data can be converted to any kind of format.

 * The database can be a PO file, an XLIFF file, or (why not) another
   database.

Definitely, any placeholder with a structure is good.

 * The translation tool is a tool able to deal with the format of the
database and help the translator (merging old translations, displaying
   the strings that need to be updated, etc.)

Sounds like a lecture for freshmen CAT tool users but go ahead.

* Displaying the translated strings is done by gettext or by generating the documentation (this is usually done by the tool used for the string
   extraction).

Oui...

I don't exclude the manual translation: the extraction is done by an
human, who stores an original string in her brain, translate it and write
it in the translated document.

...

Thus IMO, advantages of the XLIFF format should be demonstrated by
considering XLIFF as the database.
I won't consider having good translation tools or string extraction tools
that deal with XLIFF files as an advantage of the XLIFF format.

Well, then, if you won't why bother going that far trying to demonstrate that anything is just as good as anything else ?

My prefered translation tool is vi. Thus, as a translator, I prefer PO to
XLIFF or a mysql database.

And here lays the problem: you consider vi as a translation tool. Now tell me, how many people who do translations on a daily basis would consider vi as a translation tool ?

I think you are mixing 2 things here: how to manage the backend, and what format to propose to end translators. The back end could be managed using any format, even a format developed only for Debian. There are plenty of ways to do that and I don't question the Debian way (or what is going to be the Debian way).

What I am saying (maybe on my 3rd or 4th mail) is that opening the translation process to people outside the GNU/Linux world could be the result of adopting a translation industry standard that benefits the end translator because more tools exist with more options on more platforms with less nerdness to overcome before actually starting to translate. And since that end user format _happens_ to be handled as a TM management format as well (and can easily be transformed to the TM "exchange" format that is TMX also an industry standard) why not use that format as the storage format. That would save transformations.

It seems you like to have sentences separated. This is not related to
using XLIFF. This is a string extraction issue.
Note however that most of the complaints I receive for po4a (about its
strings extraction features) are that there is too few context in the
strings proposed to the translators, not that paragraphs should be split
in sentences.

Now you are talking about another problem: adequation of the format to the task. And I think your translators are very much aware of that: sentence translation is not appropriately handled by po based tools. Just like you mention in your other mail: multilingual bodies are not either properly handled by po based tools.

Context was an issue in the translation world before po ever existed and before computer guy realized there was a need for localizing their strings. And the proof that they did not quite get it the right way is the character set issues that eventually are starting to get solved with Unicode being pretty much generally accepted.

Now it happens that quite a number of translation based groups (using computers and not the other way round) have analysed this problem and have come up with a number of reasonable standards that are also starting to get common acceptance: xliff for the translation itself, tmx for its exchange, tbx for glossaries, srx for segmentation (not limited to "sentence" segmenting). The standards work very well together and offer a stable common ground on which back ends are created, tools are developped, translations are accomplished etc.

And, of course, all the above processes include from the earliest premisses what it took the computer guys so long to figure out, because the computer guys are not essentially aware of localization and translation issues, mostly because they don't need to be. And that is fair enough.

It seems you prefer XLIFF for HTML translations. I don't know why, but it is probably not related to the XLIFF format. Maybe it is just that the tool
you use with your XLIFF files is better than what a PO tool would have
done (at what step: string extraction, translation?).

No I don't prefer xliff for html transformations, I don't transform, I translate. I understand that gettext and po and all the related tralala does what it is requested to do: extract text from source, put that in a nice package for display in a string editor and insert back the finished product.

Now my idea, but I may have gotten that wrong, it that all this was developed in a very specialized context of _localicalising certain applications_ and not of translating. Localization and translation are two different processes and although they overlap they are not equivalent. po comes from a specific localisation context and extending it forever to the complexity of the current document formats and workflows seems like a mistake to me since it is not designed for that in the first place. Cf the above 2 examples (segmentation and multilinguism). While the translation world has produced standards that also apply to localization processes.

One of the advantages of XLIFF could be for storing the old original and
translated strings (this is convenient to check what changed in the
original string; i.e. was the original string updated just to fix a typo, or did the meaning changed?). With a PO, a versionning system can help to
check what changed; but this doesn't always work smoothly (e.g. when a
string move inside a PO).

Because it is not designed for that.

This is not directly an advantage for the translator.

??? Oh really ?

It is just something
that can help the translation tool to propose another feature.

Propose to whom ? Somebody else but the translator ?

With a PO, this could be done by providing the old and new PO to a tool (I would
love to have such a tool).

Well, that already exists with tools that support a standard that's relevant to translation, which is not the case for po as you say.

One feature I don't like in XLIFF is having multiple languages in the same XLIFF file. This was proved wrong with the debconf translations (multiple
translation updates can't be committed).

1) a xliff file is not required to have more than 2 translation unit variants. 2) if you use formats that are not designed to support multiple tuv chances that the result will not be satisfying.

If you claim that XLIFF translation tools are better integrated to
translation memories, it is "just" a translation tool issue. And a PO
translation tool could also use TMX, or XLIFF memories could be converted
to gettext compendia.

You are saying po "could" be used for things it is not designed for. I am saying standards exist, designed for that specific task and tools exist that do not require to accept nerdiness on the end user (translator) side.

Then if XLIFF based tools are really much better than PO based tools,
maybe these tools should be used (PO could be created temporarily if
needed). The links you provided did not convinced me.

The tools are not "much better", the tools are designed to support a format that is designed for the tasks needed in the Debian i18n framework. If you want to write something from scratch using formats that are not designed for the job you are bound to face a number of difficulties. You already described a number of po limitations, and those are limitations not because of the tools, but because po is not the proper format to accomplish the task.

Now you may always argue that since po has been around for a while, it is always better to adapt the existing framework, and I'd understand this conservative approach, which is very Debianese, I really have nothing against that. But in the end, if it is about creating from scratch a framework that deals with all sorts of translation centered issues, it is better to use a format that has been designed from scratch to deal with those issues.

The 2 links were not meant to "convince" anybody, but to give a glimpse of 1) how xliff can be made to fit a po centered system, with a transition to xliff in mind, and how xliff is used in a broader perspective.

There are plenty of people who have extensively written on localization issues (see the ibm developer site, oasis, lisa) and I can assure you that they have plenty of good reasons to use such standards and not po.

Joyeux Noël anyway :)

Jean-Christophe Helary


Reply to: