[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Wordforge, Pootle, GSoC, Debian



Hi all

I'm part of the WordForge team and developer on Pootle and the translate toolkit ...

I've just read through hundreds of mails in the Debian-i18n list archive, as well as the i18n and l10n paper for DebConf (eagerly awaiting the video of the sessions...). It's been great to read the discussion, I am excited about where things are headed and how many of the ideas in the original discussion on this list are aligned with the ideas we have had in the WordForge project.

I thought I would provide some technical details on where Pootle is currently at and how it is designed, which will hopefully shed some light on some of the discussion. It may have been more helpful earlier, but I wanted to make sure I had followed the whole discussion first, hence the delay. By now some of this information is probably redundant, but anyway here we go...

History and Design
-------------------
We started working at the translate.org.za South African translation project on scripts that would enable us to translate Mozilla using PO files. We were dealing with 11 languages, and non-technical translators that weren't very skilled with computers. So we started chipping away at the many different things you had to know in order to translate. Next we included tools to translate OpenOffice.org using PO files (migrated from some perl tools that did this).

Every now and then we discussed doing a web portal, but all our attempts (and other peoples that we looked at) didn't seem to meet our requirements. Dealing with multiple languages and multiple big upstream projects was part of the perspective from the beginning.

Finally after OOoCon in September 2004, I started working on Pootle, with a slightly different philosophy to before. Simplicity seemed to work and it took off fairly quickly. With collaboration from others, we began to see what we thought was really needed, and Javier and others have spent a lot of time documenting this and strategising about it.

Structure
---------

Explanation of different parts of the project:

The Translate Toolkit comprises of
- different storage modules that handle translation formats (PO, XLIFF, Mozilla and OpenOffice.org formats, more document-centric formats like HTML, hopefully others...)
- conversion tools to convert between the above formats
- checks that can be performed on translations (punctuation, capitalization, spelling, length of translation, etc, etc)
- other tools for manipulating translations (search, combining, merging etc)

The storage modules are quite geared towards not messing up the original source format of the translation file (translation files are documents not just repositories of translations), and providing access to as much detail as is possible

We currently (Friedel in particular) have been working towards having a common API for different translation formats, so that you can work with any of them with the same tools.

Pootle is (at least) a web application for managing translations. It is built on top of the Translation Toolkit, and utilises its storage modules, checks and conversion tools.

Currently Pootle only works directly with PO files and converts to other formats if required for download. However it is in the process of being moved towards handling any of the formats natively (particularly XLIFF which has a lot of useful functionality)

Internally Pootle is fairly well modularized, here are the details:
- modules that build the web pages and handle form submissions (python files and KID templates)
   * adminpages indexpage pagelayout translatepage templates/*.html
- modules that extend the underlying storage interface, and manage reading and writing translations, and managing projects file trees
   * pootlefile potree projects versioncontrol
 - some minor scripts, tests etc
   * benchmark conflict2suggest conftest test_*
 - basic project stuff
   * __init__ __version__ filelocations (and the setup script)

Stuff that isn't totally modularized:
 - the main pootle.py contains the following:
* stuff that directs URL requests to the appropriate pages (and checks user permissions)
   * some form handling happens here that should happen in the pages etc
   * commandline parsing
   * handling of user options like the UI language
   * some code to recreate statistics
 - users.py contains the following
   * code for login, registration, activation and user options pages
* the web Session class which contains web interface options but also a bit of stuff on user rights

I see one of the main ideas is separating out the frontend from the backend. This is of course a good idea, but a fair amount of that is modularized already. My interpretation is that what is meant is not just modularising the code but creating the ability to interface with the server in other ways than the web interface (through XML-RPC, email, etc, etc). This is of course very important.

It remains to be seen how much separation is necessary. My recommendation would be to start by detailing the features required/expected (a fair amount of that has been done) and then to begin to implement them in the current structure. As that progresses it can be decided how much needs to be separated into different running programs etc. I'm not sure that we need to introduce a separate file server / web interface yet in terms of process structure (although conceptually that is mostly there already).

There are quite a few details in terms of how Pootle manages files, text indexing, statistics etc as well as performance and scalability questions but will leave those for another thread later (but don't worry, we do think about these things)

Final part of the current essential structure is the web framework used. Pootle was written using jToolkit (http://jtoolkit.sourceforge.net). Basically the situation is that I am employed by another company to do commercial software development, and this is the free software web framework that we produce and use at my company. So for me the easiest way to start writing Pootle was to use it. It, like any web framework, has strengths and weaknesses.

The most glaring problem with it was that all the user interface code was generated procedurally in Python (some other toolkits take this approach), and it was quite ugly to read, and meant improving the interface was slow. So we have replaced that all with Kid (http://kid.lesscode.org) templates (not available when Pootle was started) which are very nice, and have improved stuff a lot and made the code more readable.

Our current in-development version has addressed a lot of other issues and made everything rosy and shiny, which may end up helping Pootle, but I won't go into that now.

jToolkit can interface with Apache via modpython - the problem for Pootle has been issues about locking files etc in a multiprocess situation, which I am currently looking at. So at the moment it has to be run as a standalone web server (based on the builtin Python web server), but that should change soon.

Working Together
----------------

It really seems like Debian and WordForge's needs and goals are aligned, and this isn't just coincidental, it's because we have a similar philosophy. From my reading of the discussions it seems like we have the same idea about the challenges and solutions to managing free software localization. Long term I am sure we will end up with a great system that benefits not only these two projects but the whole free software and localization communities.

For the GSoC and the short term, it is important to outline what the main goals to be achieved are. Then we will see the way best forward for the project together that benefits everyone. So as Christian and others have said, I think this initial discussion is really important. I'd really like everyone to work out of the same version control tree, and to be sharing and talking often as we progress, so that we keep moving forward together. (We use IRC: #pootle on irc.debian.org/irc.freenode.org is the usual channel, and any discussion welcome there).

So, I hope that information is useful, particularly to all those who have been involved in the discussion and planning (what is the status of the GSoC project? accepted?)

I have left out a lot of the plans about making Wordforge a distributed system etc simply because I hope everyone is aware of it, so this is just a current technical status info email

Any questions welcome, I look forward to more discussion and working together

Cheers
David

PS I run Fedora on my home machine. Is that a sin? I use apt repositories though, so maybe I can be excused :-)



Reply to: