[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [translate-pootle] Wordforge



Hi,

> As the others have mentioned, yes, we are doing 2 major restructure:
> 
> 1) base classing of our convertor classes and Pootle
> 2) Locking
> 
> Some of this does affect backend separation.
> 
> There of course is lots of work, but I understand that yours needs to
> be quite independent.  We have been discussing the concept of change
> queues which relates to locking ie we don't want to step on someone
> else's change if for some reason you had yours out for too long.
> Queues would also allow us to create a distributed environment. This
> direction could be the most fruitful for your work.
> 
> But as the others have said #pootle is a good place to ask.

I have a strong opinion about the direction Pootle's backend should be
headed. I think that at the moment you have a 'loose' system based on
files, which is simple and transparent.  However, it is inefficient
in memory usage, speed and ease of distribution.  Since memory usage
depends on the database size, I take it that you are using some sort of
memory cache to speed up the system.

I think that an obvious solution here is to use a relational database
(I would suggest PostgreSQL).  Unlike ordinary files, it allows
extremely speedy random writes which is exactly what we need here.
I would expect the problem of high memory usage to disappear completely
too.  In fact, if we can put all important data on the database,
distribution of the system would then become trivial -- several
instances of the application (possibly on different computers) would
simply use a single instance of the database.  I would say that by doing
everything (indexing, locking, etc.) manually we're reinventing the
wheel, badly.

I also think that using XLIFF, an XML format, for the backend is a bad
idea.  I think that XML is great for serializing data and sharing it
between completely disparate systems, but it's awful for random writes
and places where performance is important (such as the data storage
backend for a heavily used system). Nobody cares whether the backend
storage is compliant with some standard, it's only the interface where
standards-compliance matters. The backend must simply be as efficient
as possible and not get in the way.

I do not mean to thrash your design decisions or stall work on the
backend.  Files as backend are great for small projects where
performance is unimportant, because then you don't need to set up an
SQL server.  I just want to suggest designing the API in such a way that
does not depend on files, i.e., such that a relational database would
not be too much trouble to plug in.

I am concerned with this issue because my SoC project is for Debian,
not for Pootle directly, so I will need a backend that would handle
*all* Debian translations at the same time.  That would mean gigabytes
of data -- a "small" relational database by modern standards, but
seemingly infeasible with the current structure because of huge
resource usage. And I need it by the end of summer ;)  Obviously I will
be working on this, but I would like to make sure in advance that you
will not be working in the opposite direction.

During the discussion with Aigars in Riga he seemed to agree with the
idea of hooking Pootle to a relational database.

I would be happy to hear your thoughts.  I hope the letter did not come
out too harsh.  There may be more options here, or you may have some
plans that I am simply not aware of.  However, this is critical for my
work for Debian and I want to cover this ASAP.

Best regards,
-- 
Gintautas Miliauskas
http://gintasm.blogspot.com

Attachment: signature.asc
Description: PGP signature


Reply to: