[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [translate-pootle] [Fwd: Re: Wordforge]



Hi Gintautas

Gintautas Miliauskas wrote:

I do not really care about the actual implementation of the backend.
The reason I am advocating an RDBMS is to influence the design of the
backend.  In the current situation, an API that just presents parsed
files and writes objects back would do, but it would not be possible to
have an efficient backend based on a database this way.
Independently of the backend storage, FOSS projects and translators (the users and customers of the system) work with files, which must interact with Gettext, translation editors, etc. If the API does not produce files, then another layer on top will have to do it, creating a new unnecessary layer. Same for the XML-RPC. If we want to exchange data between servers, this data will relate to specific projects, and the data will have to be grouped in the files that the project produces.

Only in the case of the on-line editor, strings might be addressed directly, without having to construct the file, but contextual information will stll be necessary in many cases. For example, when translating a segmented text (such as a help page), it is important to know the preceeding sentences and the ones that follow.

Here's a single simple functional requirement that I propose for the
backend:

=============
It should be possible to write a separate program which should be able
to do everything possible only using the backend object.  No
assumptions on files, no nothing, just manipulating objects.  I would
not have to care about locking, other servers running in parallel
using the database, the OS, the filesystem or anything else.  That also
includes uploads of new languages / new projects.
=============

Specifically, say I want an XML-RPC server.  I just write a Python
script that imports some classes from pootle.backend (or whatever),
defines some functions and makes them public.

It's a pretty basic requirement for any component: independence.  Does
your new design offer that?
Independence of...  ?

There is also a more technical question of the nature of the backend.
It can be procedural (you call functions which return things), or it
can be object-oriented: there are a few functions that return "basic"
objects defined by interfaces rather than by implementation.  Then
you operate on those to get subobjects, to save  the changes to the
objects, etc.  Which approach are you using?

A few more notes:

Trying to stick to a standard format (XLIFF, .po or anything else)
for backend storage is not a good idea because it will be unnecessarily
limiting.  The standards have their own specifics which we may not
care about, so there's the overhead of storing things the 'standard'
way, even if it is not convenient.  Eventually the format will simply
not be enough.  We will want to keep lots more metadata (e.g.,
string history, string submitter, date, ...).  Storing that in external
files, separated from the actual data, will be increasingly
uncomfortable.
You should read the standards. They are made by people who have been working in localisation for many years, have extremelly clear understanding of what is necessary, how to structure it and how it should be encoded. We DO care about them. If there was no PO standard, each FOSS application would have to do its own translation editor, and we would not be here now. We produce standard files so that standard translation editors can be developed and used. There is no modern computer science without standards.

Standards are in constant review and evolution, making sure that new types of data that might be necessary are implemented. There is not such a thing as "eventually the format will not be enough". Beside standard extensions, XLIFF and many other XML formats (such as say OpenDocument) allow user extensions, for the cases in which the people who define the format might have left anything out. We are working with XLIFF 1.1, but XLIFF 2.0 is being worked on, even if very few changes will take place, and all of them backwards compaltible.

The debate of files/DB backend is -nevertheless- independent from the use of standards.

Anyway, from the engineering point of view, there is 'primary' data,
and there are 'views' on that data.  Do not confuse the two!
XLIFF, .po or .html are just views.  Data is just that, data, it has no
connection to a format until you serialize it.  (An RDB is attractive
as you don't have to serialize your data and commit to a format.).
You DO commit to a format, it just happens to have very efficient ways of handling data (in general)

Frankly I am extremely puzzled with what I perceive to be a hostile
look towards RDBMSes.  Some seem to be willing to jump through many
hoops to defend file-based approaches.  In fact I found conflicting
advice in the wiki itself:
Please do not confuse being careful with being against something, and please do not use words as "hostile". Some of us have been working on localisation for quite a number of years, as well as in development, databases and development of standards. We understand the complete set of data that needs to be managed, something in which some of us have been working for quite a while. The DB vs. files approach has been discussed innumerable times, we are quite aware of the advantages of databases.. and of their problems. Any argument that you might put forward has already been used internally, by people who have been using DB for quite a while (some of us for 20 years). The use of files has quite a number of advantages, and the project has followed this line of development, which we question often, but never strong enough as to abandon current developments and change. The issue of scalability requires that we look at the DB approach. Having said this, your forceful approach on DB demands black/white answers (agree/do not agree) on a subject that for us is much more complex.

If there is change, it will not be tomorow. It requires clear planning and some security that the new approach is better, which we will only have through experience. This is why I propose in my prior mail developing an experimental second DB based back-end (which we are prepared to fund), to ensure that all data can be easily mapped and that it works better. If it comes out to be clearly better, we will be the first ones to go for it.

I am
feeling a little frustrated.  I sincerely want to help this project as
much as I can, and I feel that these fundamental issues must be resolved
before I get deep into the technical details.
The work that was planned is for your SoC project was very clear, and does not require any decision on the technology of the back-end. It will help the implementation of different approaches, but those approaches do not need to be decided now (even if work on figuring out if they are better can start immediatly).

You opinion on the back-end is important, as many others, but please remember that there are other people involved, and that there are reasons why we do things the way we do them. At some point we might need to change the way things are made, but we need to be sure that we are moving to a better approach. Opinions are not enough.

Javier






Reply to: