[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Fwd: Re: [translate-pootle] Wordforge]



I just saw that I forgot to CC the debian list with my reply of
yesterday. 


On Do, 2006-06-08 at 00:17 +0300, Gintautas Miliauskas wrote:
> Hi,
> 

Hi

There are a lot of things to reply to. I've CC'd the developers list
(this is the users list).

Hopefully the other people will respond as well to give a more thorough
reply.

> > As the others have mentioned, yes, we are doing 2 major restructure:
> > 
> > 1) base classing of our convertor classes and Pootle
> > 2) Locking
> > 
> > Some of this does affect backend separation.
> > 
> > There of course is lots of work, but I understand that yours needs to
> > be quite independent.  We have been discussing the concept of change
> > queues which relates to locking ie we don't want to step on someone
> > else's change if for some reason you had yours out for too long.
> > Queues would also allow us to create a distributed environment. This
> > direction could be the most fruitful for your work.
> > 
> > But as the others have said #pootle is a good place to ask.
> 
> I have a strong opinion about the direction Pootle's backend should be
> headed. I think that at the moment you have a 'loose' system based on
> files, which is simple and transparent.  However, it is inefficient
> in memory usage, speed and ease of distribution.  Since memory usage
> depends on the database size, I take it that you are using some sort of
> memory cache to speed up the system.
> 

Indeed. There is a cache.

> I think that an obvious solution here is to use a relational database
> (I would suggest PostgreSQL).  Unlike ordinary files, it allows
> extremely speedy random writes which is exactly what we need here.
> I would expect the problem of high memory usage to disappear completely
> too.  In fact, if we can put all important data on the database,
> distribution of the system would then become trivial -- several
> instances of the application (possibly on different computers) would
> simply use a single instance of the database.  I would say that by doing
> everything (indexing, locking, etc.) manually we're reinventing the
> wheel, badly.
> 

Well, we must remember that Pootle does a lot of things. I mostly agree
with what you are saying. We all understand why databases are good at
these things. But Pootle is also a file server. It doesn't necessarily
make sense to convert everything to a database and back each time you
want to do a file operation. For some people the file management is the
desirable part. We try to minimise diffs for the benefit of people
needing to review patches for upstream projects and things like that.
Some projects are _very_ picky about these things. So the files do have
a place I believe for some of our users. Of course we'll run into
problems when we have a thousand translators busy with interactive
translation, and we have already realised and discussed many of the
things you mention. Eventually we want to make Pootle as distributed as
possible, not just in the sense that you mean, but even with more
independent installations being able to interact.

As for specific technology - jToolkit (that we already use) has support
for many databases so we'll probably use that functionality so that we
don't become tied to any specific one.

> I also think that using XLIFF, an XML format, for the backend is a bad
> idea.  I think that XML is great for serializing data and sharing it
> between completely disparate systems, but it's awful for random writes
> and places where performance is important (such as the data storage
> backend for a heavily used system). Nobody cares whether the backend
> storage is compliant with some standard, it's only the interface where
> standards-compliance matters. The backend must simply be as efficient
> as possible and not get in the way.
> 

Here I agree. Some people I know would say XML isn't good for
anything :-)  I believe that eventually XLIFF would be used for
interchange only. Probably the same for TMX as well.

> I do not mean to thrash your design decisions or stall work on the
> backend.  Files as backend are great for small projects where
> performance is unimportant, because then you don't need to set up an
> SQL server.  I just want to suggest designing the API in such a way that
> does not depend on files, i.e., such that a relational database would
> not be too much trouble to plug in.
> 

Yes. That is exactly what we have already started with to some extent.
Initially Pootle was entirely PO file based and of course one needs to
start small. All our tools worked with po files, so this was logical.
But now we have developed a more general API so that all our tools (not
just Pootle) can interact with them in a general way. Now hopefully a
database can become another storage mechanism that plugs into our API.
That way we also don't force people to start using a database if they
don't want/need to.

> I am concerned with this issue because my SoC project is for Debian,
> not for Pootle directly, so I will need a backend that would handle
> *all* Debian translations at the same time.  That would mean gigabytes
> of data -- a "small" relational database by modern standards, but
> seemingly infeasible with the current structure because of huge
> resource usage. And I need it by the end of summer ;)  Obviously I will
> be working on this, but I would like to make sure in advance that you
> will not be working in the opposite direction.
> 

No, I don't see our goals to be conflicting at all. Our previous public
installation of Pootle was getting slow later on. Recently I learnt that
it was running on a machine from the previous century :-)  (Pentium or
something).  But of course the Debian files would mean a new scale of
operation. And I don't believe working on that is in conflict with what
we are doing. But I believe it is also useful to remain the current
approach as well, since I think it is exactly what some people want.

> During the discussion with Aigars in Riga he seemed to agree with the
> idea of hooking Pootle to a relational database.
> 
> I would be happy to hear your thoughts.  I hope the letter did not come
> out too harsh.  There may be more options here, or you may have some
> plans that I am simply not aware of.  However, this is critical for my
> work for Debian and I want to cover this ASAP.
> 
> Best regards,

Thank you for your thorough posting. I'm not offended at all, if that is
what you feared :-)  Hopefully the others will also reply more
thoroughly. Just wanted to start the discussion.

F



Reply to: