[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Fwd: Re: [translate-pootle] Wordforge]



Hello,

thanks for your letter, it helped me understand some of the motivation
to preserve files, namely that the current system basically works with
files as such.  I am just not sure that this is a viable strategy in
the long term.  For some reason people don't pay attention at all to
the usecase of online web translation.  It may not be the primary goal,
but I am sure that if we do it right it will be immensely popular.
The hidden reasoning is that "it's not used now and generally things
don't change".  Well, such a thing does not exist, that's why it's not
used.  Look at Rosetta: they have huge flaws (no search on strings,
slow uploads, outrageously complex .po download procedure, ...), and
they are far from unpopular.

For now I would like to concentrate on the backend.  It is the base of
the system and therefore important, even if it does not directly affect
functional specifications.

In the previous letter I put my important part (about the API) last and
most people seemed to miss it and instead focused on RDBMSes.

I do not really care about the actual implementation of the backend.
The reason I am advocating an RDBMS is to influence the design of the
backend.  In the current situation, an API that just presents parsed
files and writes objects back would do, but it would not be possible to
have an efficient backend based on a database this way.

Here's a single simple functional requirement that I propose for the
backend:

=============
It should be possible to write a separate program which should be able
to do everything possible only using the backend object.  No
assumptions on files, no nothing, just manipulating objects.  I would
not have to care about locking, other servers running in parallel
using the database, the OS, the filesystem or anything else.  That also
includes uploads of new languages / new projects.
=============

Specifically, say I want an XML-RPC server.  I just write a Python
script that imports some classes from pootle.backend (or whatever),
defines some functions and makes them public.

It's a pretty basic requirement for any component: independence.  Does
your new design offer that?

There is also a more technical question of the nature of the backend.
It can be procedural (you call functions which return things), or it
can be object-oriented: there are a few functions that return "basic"
objects defined by interfaces rather than by implementation.  Then
you operate on those to get subobjects, to save  the changes to the
objects, etc.  Which approach are you using?

A few more notes:

Trying to stick to a standard format (XLIFF, .po or anything else)
for backend storage is not a good idea because it will be unnecessarily
limiting.  The standards have their own specifics which we may not
care about, so there's the overhead of storing things the 'standard'
way, even if it is not convenient.  Eventually the format will simply
not be enough.  We will want to keep lots more metadata (e.g.,
string history, string submitter, date, ...).  Storing that in external
files, separated from the actual data, will be increasingly
uncomfortable.

The fear that files shalt be changed unnecessarily is unfounded.  You
already deserialize the files into objects and later serialize them
back. There is no difference between storing the serialized form or the
deserialized one in this regard.

Anyway, from the engineering point of view, there is 'primary' data,
and there are 'views' on that data.  Do not confuse the two!
XLIFF, .po or .html are just views.  Data is just that, data, it has no
connection to a format until you serialize it.  (An RDB is attractive
as you don't have to serialize your data and commit to a format.).

The 'speed' argument is also not valid, because these 'views' are
trivial to cache once the 'base' data and the 'derivative' data is
separated.

I maintain that RDBMS is the tool for the job here, and reinventing
it will be painful.  Gasper outlined a stack that could bring a few
advantages of an RDBMS, but also significant complexity and fragility.
E.g., I do not think NFS is a good choice here, it is by any means not
transactional. And that would still not give us good random-access
speed, or convenient network-transparency, or reliable transactions.

Frankly I am extremely puzzled with what I perceive to be a hostile
look towards RDBMSes.  Some seem to be willing to jump through many
hoops to defend file-based approaches.  In fact I found conflicting
advice in the wiki itself:

http://translate.sourceforge.net/wiki/pootle/metadata advocates a DB
http://translate.sourceforge.net/wiki/wordforge/file_system - files

(BTW the last link also contains misinformation about migrating
database versions as an argument against RDBMS.  This can easily be
done automatically.  In addition it assumes that metadata can be
extracted from data, which is (or will soon be) incorrect.)

Phew.  I just had to write this out to get it out of my head.  I am
feeling a little frustrated.  I sincerely want to help this project as
much as I can, and I feel that these fundamental issues must be resolved
before I get deep into the technical details.

Best regards,
-- 
Gintautas Miliauskas
http://gintasm.blogspot.com

Attachment: signature.asc
Description: PGP signature


Reply to: