[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [translate-pootle] Wordforge



Hello Gintautas,

On Thu, Jun 08, 2006 at 12:17:28AM +0300, Gintautas Miliauskas wrote:
> Hi,
> 
> I have a strong opinion about the direction Pootle's backend should be
> headed. I think that at the moment you have a 'loose' system based on
> files, which is simple and transparent.  However, it is inefficient
> in memory usage, speed and ease of distribution.  Since memory usage
> depends on the database size, I take it that you are using some sort of
> memory cache to speed up the system.
> 
> I think that an obvious solution here is to use a relational database
> (I would suggest PostgreSQL).  Unlike ordinary files, it allows
> extremely speedy random writes which is exactly what we need here.
> I would expect the problem of high memory usage to disappear completely
> too.  In fact, if we can put all important data on the database,
> distribution of the system would then become trivial -- several
> instances of the application (possibly on different computers) would
> simply use a single instance of the database.  I would say that by doing
> everything (indexing, locking, etc.) manually we're reinventing the
> wheel, badly.

I'm not that sure a database would bring a that important performance
improvement.

Writing to the database will probably be faster than writing an XLIFF
file.
But users may want to retrieve XLIFF files. This operation will be faster
if the strings are already stored in a XLIFF file.

Also, I'm not sure a Pootle server is mostly doing write operations
(the number of write operations is probably proportionnal to the number of
users).
The CPU may be more occupied in doing fuzzy matching of strings. I'm not
sure the fuzzy matching algorithm can use some kind of cache in a
database. (The number of fuzzy matching operation is more than
proportionnal to the number of strings - which IMHO better reflects the
size of the translation server than the number of the simultaneous users
triggering write operations)


> I also think that using XLIFF, an XML format, for the backend is a bad
> idea.  I think that XML is great for serializing data and sharing it
> between completely disparate systems, but it's awful for random writes
> and places where performance is important (such as the data storage
> backend for a heavily used system). Nobody cares whether the backend
> storage is compliant with some standard, it's only the interface where
> standards-compliance matters. The backend must simply be as efficient
> as possible and not get in the way.
> 
> I do not mean to thrash your design decisions or stall work on the
> backend.  Files as backend are great for small projects where
> performance is unimportant, because then you don't need to set up an
> SQL server.  I just want to suggest designing the API in such a way that
> does not depend on files, i.e., such that a relational database would
> not be too much trouble to plug in.

I'm not that used to Pootle. Maybe the base.TranslationStore (or
TranslationUnit) API can be used for a database storage.

It could be latter interresting to investigate the performance gain given
by such a storage. So if you think a method is missing or should be
generalized (to help using faster search in a database), this could be
nice to know it in advance.

> I would be happy to hear your thoughts.  I hope the letter did not come
> out too harsh.  There may be more options here, or you may have some
> plans that I am simply not aware of.  However, this is critical for my
> work for Debian and I want to cover this ASAP.

If you can work on an API for the storage method, this won't be critical,
and could be sorted out during an optimization phase (Pootle 1.5.1 in the
Wordforge's roadmap)

Kind Regards,
-- 
Nekral



Reply to: