Re: Announcing the Open Source Translation Database

To: debian-i18n@lists.debian.org
Subject: Re: Announcing the Open Source Translation Database
From: "Sveinn í Felli (IMAP)" <sveinki@nett.is>
Date: Mon, 12 Mar 2012 09:36:29 +0000
Message-id: <[🔎] 20120312094251.446E713A48FB@liszt.debian.org>
In-reply-to: <201203090856.q298uRFY024818@littlesvr.ca>
References: <201203090856.q298uRFY024818@littlesvr.ca>

Þann fös  9.mar 2012 19:16, skrifaði Andrew Smith:
> Hi Sveinn
> 
> Le 2012-03-09 03:56, Sveinn í Felli a écrit :
>> Don't know whether to reply directly or to
>> <debian-i18n@lists.debian.org>:
>>
> Well then I'll throw it back into the list :)
> 
-----------
>> I keep old versions to feed my TMs and to make
>> compendiums and glossaries.
> What's a TM? How do you manage so many po files, do you have
> some kind of automation or is it all in your head? Though
> seems like it would be extremely challenging to keep so much
> in your head.
> 
Sorry about the jargon; TM = Translation Memory => mostly as
flat TMX files for Lokalize, OmegaT and other translation
software. I also (still) use Kbabel for some tasks, it uses
a kind of a Berkeley-DB as a backend TM.
The best way to feed translations into Kbabel is to let it
read through a directory structure full of POs.

------------
> 
> I suspect most of KDE, Firefox, OpenOffice, and other apps
> are not in that list because they use their own systems (I
> think it's XPI files for Firefox for example).

The XPIs contain *.properties files, commonly used for
Java-based software; Zimbra is an example that comes into
mind. But there exist neat little scripts like prop2po in
gettext-tools.

> 
> It would definitely be great if I could import all those
> into the database too, but it would hard to justify the
> programming effort if it's just one piece of software. If
> you work with those communities, maybe you can help me
> understand how they work and how I could work with them for
> mutual advantage? Either here or offline.

Sorry, my capacities lie elsewhere than in scripting or
programming; I'm more of a language nerd than computer buff ;-)
But, for my language I'm involved in quite many projects, so
if you need info on where/how some odd software is
translated, you can try to ping me.

> 
>> So, my first question is (obviously?); is there a way for
>> bulk-submitting files ?
>> Do you use pointers to repositories for the big projects (I
>> guess that's what 'bubulle' helped you with) ?
>>
> Originally I was going to have a feature where po files are
> pulled automatically from version control systems, but then
> I realised that with Git and Mercurial (more and more
> popular every day) I cannot check out just the head revision
> of just the po directory. I need to pull /everything/ which
> is not realistic given my bandwidth limitations. I don't
> even know if I'd have the disk space to handle a lot of them.
> 
> Where the 11 million translations came from is a tarball
> Christian made with (roughly speaking) every PO file in
> Debian. I wrote a special PHP script to traverse that tree,
> parse the PO files, and insert all new translations into the
> databse.
> 
> I would be more than happy to work with you or anyone else
> to come up with systems to import more translations in. So
> if you have ideas for how this would be useful to you (or
> others) - by all means let me know.

Have you checked out http://open-tran.eu/ ?
Some years ago I had some exchange with rzyjontko about
PO-sources for my language; if I recall correctly, he was
parsing PO-files from published lang-packages of the various
projects. Don't know where he keeps his repo, but he may
have interesting scripts for fetching/parsing the packages.
I presume it's all OSS.
Just my 2 centimes.

> 
>> If not; would it be reasonable to feed in my various
>> compendium files (as *.po) ? Does it matter from which
>> project the strings come from (do you track it) ?
>>
> The source of the translations matters so that when the
> users get their automatic translation they can decide
> whether it's what they want based on what software they're
> from originally. Right now if you mouse over the results -
> it will show one of the sources. When I fix it - it will
> show all the software where that translation has been used.
> 

I suspected this; logical approach.

Saw that you also got comments about licencing; valid
concerns there, especially knowing how many translators
don't fill in header-information (or may not have access to
it, e.g. Launchpad).

Best regards
Sveinn í Felli

Reply to:

Prev by Date: Modified templates in lxc
Next by Date: Intent to NMU libpam-ldap to fix pending po-debconf l10n bugs (and multiarch support)
Previous by thread: Re: Announcing the Open Source Translation Database
Next by thread: Intent to NMU b2evolution to fix pending po-debconf l10n bugs
Index(es):
- Date
- Thread