Re: Announcing the Open Source Translation Database

To: debian-i18n@lists.debian.org
Subject: Re: Announcing the Open Source Translation Database
From: Andrew Smith <asmith16@littlesvr.ca>
Date: Fri, 09 Mar 2012 14:16:06 -0500
Message-id: <[🔎] 4F5A56F6.8080708@littlesvr.ca>
In-reply-to: <201203090856.q298uRFY024818@littlesvr.ca>
References: <201203090856.q298uRFY024818@littlesvr.ca>

Hi Sveinn

Le 2012-03-09 03:56, Sveinn í Felli a écrit :

Don't know whether to reply directly or to
<debian-i18n@lists.debian.org>:

Well then I'll throw it back into the list :)

I'm a coordinator for a language (is_IS) in several dozens
of projects, meaning I'm in the business of coordinating
terms and translations consistently (+spellchecking) across
different projects. Of course I translate too ;-)

Cool!

This means I've got a bunch of PO-files partially or fully
translated; last count made around 200.000 POs of various
generations. Of those maybe ~3500 are 'active' project
files, I keep old versions to feed my TMs and to make
compendiums and glossaries.

What's a TM? How do you manage so many po files, do you have some kindof automation or is it all in your head? Though seems like it would beextremely challenging to keep so much in your head.

Big projects like *buntu, KDE and LibreOffice are huge - my
last compendium for LO-UI had about 28.000 strings excluding
single words. A while ago I counted 400.000 strings in one
of my TMs.

I imported from about 70% of the PO files in Christian's tarball. But Ithink it only included translations from software with easily accessiblePO files.

I suspect most of KDE, Firefox, OpenOffice, and other apps are not inthat list because they use their own systems (I think it's XPI files forFirefox for example).

It would definitely be great if I could import all those into thedatabase too, but it would hard to justify the programming effort ifit's just one piece of software. If you work with those communities,maybe you can help me understand how they work and how I could work withthem for mutual advantage? Either here or offline.

So, my first question is (obviously?); is there a way for
bulk-submitting files ?
Do you use pointers to repositories for the big projects (I
guess that's what 'bubulle' helped you with) ?

Originally I was going to have a feature where po files are pulledautomatically from version control systems, but then I realised thatwith Git and Mercurial (more and more popular every day) I cannot checkout just the head revision of just the po directory. I need to pull/everything/ which is not realistic given my bandwidth limitations. Idon't even know if I'd have the disk space to handle a lot of them.

Where the 11 million translations came from is a tarball Christian madewith (roughly speaking) every PO file in Debian. I wrote a special PHPscript to traverse that tree, parse the PO files, and insert all newtranslations into the databse.

I would be more than happy to work with you or anyone else to come upwith systems to import more translations in. So if you have ideas forhow this would be useful to you (or others) - by all means let me know.

If not; would it be reasonable to feed in my various
compendium files (as *.po) ? Does it matter from which
project the strings come from (do you track it) ?

The source of the translations matters so that when the users get theirautomatic translation they can decide whether it's what they want basedon what software they're from originally. Right now if you mouse overthe results - it will show one of the sources. When I fix it - it willshow all the software where that translation has been used.

That wouldn't matter so much if the OSTD user spoke the language they'retranslating into, but in most cases they won't.

Then some feedback;
Tested submitting one POT-file for translation, chose one I
knew that had a partial translation at debian.

The suggestions seem OK at first glance. But shouldn't they
be marked as 'fuzzy' ? Or at least given the choice for that ?

That's a judgement call I had to make. I decided to add a comment toevery automatically generated translation so that people will know it'snot as reliable as a human translation. Fuzzy translations in myexperience are ignored and eventually discarded by some software oranother, and that would almost negate the benefits for new projectmaintainers.

The idea is that if your software is partially translated into alanguage, and maybe there are some mistakes, a translator from thatlanguage will be encouraged to fix it, more than they would beencouraged to start a translation from scratch. At least I hope so :)

Other remark: the output has a default name LANG.po; I think
it should either keep the original POT name or at least
write the original name in the header.

Sounds like a great idea, I will put it on my todo list, should besimple enough to mention the pot name in the header.

That's about it.
Congratulations for your project, surely it will help out
many people.

Thanks!

Best regards,
Sveinn í Felli

Reply to:

Prev by Date: Re: Announce of the upcoming NMU for the drizzle package
Next by Date: Re: Announcing the Open Source Translation Database
Previous by thread: Re: Announcing the Open Source Translation Database
Next by thread: Re: Announcing the Open Source Translation Database
Index(es):
- Date
- Thread