[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Announcing the Open Source Translation Database



Hi Sveinn

Le 2012-03-09 03:56, Sveinn í Felli a écrit :
Don't know whether to reply directly or to
<debian-i18n@lists.debian.org>:

Well then I'll throw it back into the list :)

I'm a coordinator for a language (is_IS) in several dozens
of projects, meaning I'm in the business of coordinating
terms and translations consistently (+spellchecking) across
different projects. Of course I translate too ;-)

Cool!

This means I've got a bunch of PO-files partially or fully
translated; last count made around 200.000 POs of various
generations. Of those maybe ~3500 are 'active' project
files, I keep old versions to feed my TMs and to make
compendiums and glossaries.
What's a TM? How do you manage so many po files, do you have some kind of automation or is it all in your head? Though seems like it would be extremely challenging to keep so much in your head.

Big projects like *buntu, KDE and LibreOffice are huge - my
last compendium for LO-UI had about 28.000 strings excluding
single words. A while ago I counted 400.000 strings in one
of my TMs.

I imported from about 70% of the PO files in Christian's tarball. But I think it only included translations from software with easily accessible PO files.

I suspect most of KDE, Firefox, OpenOffice, and other apps are not in that list because they use their own systems (I think it's XPI files for Firefox for example).

It would definitely be great if I could import all those into the database too, but it would hard to justify the programming effort if it's just one piece of software. If you work with those communities, maybe you can help me understand how they work and how I could work with them for mutual advantage? Either here or offline.

So, my first question is (obviously?); is there a way for
bulk-submitting files ?
Do you use pointers to repositories for the big projects (I
guess that's what 'bubulle' helped you with) ?

Originally I was going to have a feature where po files are pulled automatically from version control systems, but then I realised that with Git and Mercurial (more and more popular every day) I cannot check out just the head revision of just the po directory. I need to pull /everything/ which is not realistic given my bandwidth limitations. I don't even know if I'd have the disk space to handle a lot of them.

Where the 11 million translations came from is a tarball Christian made with (roughly speaking) every PO file in Debian. I wrote a special PHP script to traverse that tree, parse the PO files, and insert all new translations into the databse.

I would be more than happy to work with you or anyone else to come up with systems to import more translations in. So if you have ideas for how this would be useful to you (or others) - by all means let me know.

If not; would it be reasonable to feed in my various
compendium files (as *.po) ? Does it matter from which
project the strings come from (do you track it) ?

The source of the translations matters so that when the users get their automatic translation they can decide whether it's what they want based on what software they're from originally. Right now if you mouse over the results - it will show one of the sources. When I fix it - it will show all the software where that translation has been used.

That wouldn't matter so much if the OSTD user spoke the language they're translating into, but in most cases they won't.

Then some feedback;
Tested submitting one POT-file for translation, chose one I
knew that had a partial translation at debian.

The suggestions seem OK at first glance. But shouldn't they
be marked as 'fuzzy' ? Or at least given the choice for that ?

That's a judgement call I had to make. I decided to add a comment to every automatically generated translation so that people will know it's not as reliable as a human translation. Fuzzy translations in my experience are ignored and eventually discarded by some software or another, and that would almost negate the benefits for new project maintainers.

The idea is that if your software is partially translated into a language, and maybe there are some mistakes, a translator from that language will be encouraged to fix it, more than they would be encouraged to start a translation from scratch. At least I hope so :)

Other remark: the output has a default name LANG.po; I think
it should either keep the original POT name or at least
write the original name in the header.

Sounds like a great idea, I will put it on my todo list, should be simple enough to mention the pot name in the header.

That's about it.
Congratulations for your project, surely it will help out
many people.

Thanks!

Best regards,
Sveinn í Felli



Reply to: