Wordforge, Pootle, GSoC, Debian
Hi all
I'm part of the WordForge team and developer on Pootle and the translate
toolkit ...
I've just read through hundreds of mails in the Debian-i18n list
archive, as well as the i18n and l10n paper for DebConf (eagerly
awaiting the video of the sessions...). It's been great to read the
discussion, I am excited about where things are headed and how many of
the ideas in the original discussion on this list are aligned with the
ideas we have had in the WordForge project.
I thought I would provide some technical details on where Pootle is
currently at and how it is designed, which will hopefully shed some
light on some of the discussion. It may have been more helpful earlier,
but I wanted to make sure I had followed the whole discussion first,
hence the delay. By now some of this information is probably redundant,
but anyway here we go...
History and Design
-------------------
We started working at the translate.org.za South African translation
project on scripts that would enable us to translate Mozilla using PO
files. We were dealing with 11 languages, and non-technical translators
that weren't very skilled with computers. So we started chipping away at
the many different things you had to know in order to translate. Next we
included tools to translate OpenOffice.org using PO files (migrated from
some perl tools that did this).
Every now and then we discussed doing a web portal, but all our attempts
(and other peoples that we looked at) didn't seem to meet our
requirements. Dealing with multiple languages and multiple big upstream
projects was part of the perspective from the beginning.
Finally after OOoCon in September 2004, I started working on Pootle,
with a slightly different philosophy to before. Simplicity seemed to
work and it took off fairly quickly. With collaboration from others, we
began to see what we thought was really needed, and Javier and others
have spent a lot of time documenting this and strategising about it.
Structure
---------
Explanation of different parts of the project:
The Translate Toolkit comprises of
- different storage modules that handle translation formats (PO, XLIFF,
Mozilla and OpenOffice.org formats, more document-centric formats like
HTML, hopefully others...)
- conversion tools to convert between the above formats
- checks that can be performed on translations (punctuation,
capitalization, spelling, length of translation, etc, etc)
- other tools for manipulating translations (search, combining, merging etc)
The storage modules are quite geared towards not messing up the original
source format of the translation file (translation files are documents
not just repositories of translations), and providing access to as much
detail as is possible
We currently (Friedel in particular) have been working towards having a
common API for different translation formats, so that you can work with
any of them with the same tools.
Pootle is (at least) a web application for managing translations. It is
built on top of the Translation Toolkit, and utilises its storage
modules, checks and conversion tools.
Currently Pootle only works directly with PO files and converts to other
formats if required for download. However it is in the process of being
moved towards handling any of the formats natively (particularly XLIFF
which has a lot of useful functionality)
Internally Pootle is fairly well modularized, here are the details:
- modules that build the web pages and handle form submissions (python
files and KID templates)
* adminpages indexpage pagelayout translatepage templates/*.html
- modules that extend the underlying storage interface, and manage
reading and writing translations, and managing projects file trees
* pootlefile potree projects versioncontrol
- some minor scripts, tests etc
* benchmark conflict2suggest conftest test_*
- basic project stuff
* __init__ __version__ filelocations (and the setup script)
Stuff that isn't totally modularized:
- the main pootle.py contains the following:
* stuff that directs URL requests to the appropriate pages (and
checks user permissions)
* some form handling happens here that should happen in the pages etc
* commandline parsing
* handling of user options like the UI language
* some code to recreate statistics
- users.py contains the following
* code for login, registration, activation and user options pages
* the web Session class which contains web interface options but
also a bit of stuff on user rights
I see one of the main ideas is separating out the frontend from the
backend. This is of course a good idea, but a fair amount of that is
modularized already. My interpretation is that what is meant is not just
modularising the code but creating the ability to interface with the
server in other ways than the web interface (through XML-RPC, email,
etc, etc). This is of course very important.
It remains to be seen how much separation is necessary. My
recommendation would be to start by detailing the features
required/expected (a fair amount of that has been done) and then to
begin to implement them in the current structure. As that progresses it
can be decided how much needs to be separated into different running
programs etc. I'm not sure that we need to introduce a separate file
server / web interface yet in terms of process structure (although
conceptually that is mostly there already).
There are quite a few details in terms of how Pootle manages files, text
indexing, statistics etc as well as performance and scalability
questions but will leave those for another thread later (but don't
worry, we do think about these things)
Final part of the current essential structure is the web framework used.
Pootle was written using jToolkit (http://jtoolkit.sourceforge.net).
Basically the situation is that I am employed by another company to do
commercial software development, and this is the free software web
framework that we produce and use at my company. So for me the easiest
way to start writing Pootle was to use it. It, like any web framework,
has strengths and weaknesses.
The most glaring problem with it was that all the user interface code
was generated procedurally in Python (some other toolkits take this
approach), and it was quite ugly to read, and meant improving the
interface was slow. So we have replaced that all with Kid
(http://kid.lesscode.org) templates (not available when Pootle was
started) which are very nice, and have improved stuff a lot and made the
code more readable.
Our current in-development version has addressed a lot of other issues
and made everything rosy and shiny, which may end up helping Pootle, but
I won't go into that now.
jToolkit can interface with Apache via modpython - the problem for
Pootle has been issues about locking files etc in a multiprocess
situation, which I am currently looking at. So at the moment it has to
be run as a standalone web server (based on the builtin Python web
server), but that should change soon.
Working Together
----------------
It really seems like Debian and WordForge's needs and goals are aligned,
and this isn't just coincidental, it's because we have a similar
philosophy. From my reading of the discussions it seems like we have the
same idea about the challenges and solutions to managing free software
localization. Long term I am sure we will end up with a great system
that benefits not only these two projects but the whole free software
and localization communities.
For the GSoC and the short term, it is important to outline what the
main goals to be achieved are. Then we will see the way best forward for
the project together that benefits everyone. So as Christian and others
have said, I think this initial discussion is really important. I'd
really like everyone to work out of the same version control tree, and
to be sharing and talking often as we progress, so that we keep moving
forward together. (We use IRC: #pootle on
irc.debian.org/irc.freenode.org is the usual channel, and any discussion
welcome there).
So, I hope that information is useful, particularly to all those who
have been involved in the discussion and planning (what is the status of
the GSoC project? accepted?)
I have left out a lot of the plans about making Wordforge a distributed
system etc simply because I hope everyone is aware of it, so this is
just a current technical status info email
Any questions welcome, I look forward to more discussion and working
together
Cheers
David
PS I run Fedora on my home machine. Is that a sin? I use apt
repositories though, so maybe I can be excused :-)
Reply to: