Re: [translate-pootle] [Fwd: Re: Wordforge]

To: Gintautas Miliauskas <gintas@akl.lt>
Cc: F Wolff <viool@webmail.co.za>, translate-pootle@lists.sourceforge.net, debian-i18n@lists.debian.org
Subject: Re: [translate-pootle] [Fwd: Re: Wordforge]
From: Javier SOLA <javier@Khmeros.info>
Date: Sat, 10 Jun 2006 19:57:30 +0700
Message-id: <[🔎] 448AC1BA.6060002@Khmeros.info>
In-reply-to: <[🔎] 20060610002045.3791d2b5@localhost.localdomain>
References: <[🔎] 1149845389.4481.11.camel@localhost.localdomain> <[🔎] 20060610002045.3791d2b5@localhost.localdomain>

Hi Gintautas

Gintautas Miliauskas wrote:

I do not really care about the actual implementation of the backend.
The reason I am advocating an RDBMS is to influence the design of the
backend.  In the current situation, an API that just presents parsed
files and writes objects back would do, but it would not be possible to
have an efficient backend based on a database this way.

Independently of the backend storage, FOSS projects and translators (theusers and customers of the system) work with files, which must interactwith Gettext, translation editors, etc. If the API does not producefiles, then another layer on top will have to do it, creating a newunnecessary layer. Same for the XML-RPC. If we want to exchange databetween servers, this data will relate to specific projects, and thedata will have to be grouped in the files that the project produces.

Only in the case of the on-line editor, strings might be addresseddirectly, without having to construct the file, but contextualinformation will stll be necessary in many cases. For example, whentranslating a segmented text (such as a help page), it is important toknow the preceeding sentences and the ones that follow.

Here's a single simple functional requirement that I propose for the
backend:

=============
It should be possible to write a separate program which should be able
to do everything possible only using the backend object.  No
assumptions on files, no nothing, just manipulating objects.  I would
not have to care about locking, other servers running in parallel
using the database, the OS, the filesystem or anything else.  That also
includes uploads of new languages / new projects.
=============

Specifically, say I want an XML-RPC server.  I just write a Python
script that imports some classes from pootle.backend (or whatever),
defines some functions and makes them public.

It's a pretty basic requirement for any component: independence.  Does
your new design offer that?

Independence of...  ?

There is also a more technical question of the nature of the backend.
It can be procedural (you call functions which return things), or it
can be object-oriented: there are a few functions that return "basic"
objects defined by interfaces rather than by implementation.  Then
you operate on those to get subobjects, to save  the changes to the
objects, etc.  Which approach are you using?

A few more notes:

Trying to stick to a standard format (XLIFF, .po or anything else)
for backend storage is not a good idea because it will be unnecessarily
limiting.  The standards have their own specifics which we may not
care about, so there's the overhead of storing things the 'standard'
way, even if it is not convenient.  Eventually the format will simply
not be enough.  We will want to keep lots more metadata (e.g.,
string history, string submitter, date, ...).  Storing that in external
files, separated from the actual data, will be increasingly
uncomfortable.

You should read the standards. They are made by people who have beenworking in localisation for many years, have extremelly clearunderstanding of what is necessary, how to structure it and how itshould be encoded. We DO care about them. If there was no PO standard,each FOSS application would have to do its own translation editor, andwe would not be here now. We produce standard files so that standardtranslation editors can be developed and used. There is no moderncomputer science without standards.

Standards are in constant review and evolution, making sure that newtypes of data that might be necessary are implemented. There is not sucha thing as "eventually the format will not be enough". Beside standardextensions, XLIFF and many other XML formats (such as say OpenDocument)allow user extensions, for the cases in which the people who define theformat might have left anything out. We are working with XLIFF 1.1, butXLIFF 2.0 is being worked on, even if very few changes will take place,and all of them backwards compaltible.

The debate of files/DB backend is -nevertheless- independent from theuse of standards.

Anyway, from the engineering point of view, there is 'primary' data,
and there are 'views' on that data.  Do not confuse the two!
XLIFF, .po or .html are just views.  Data is just that, data, it has no
connection to a format until you serialize it.  (An RDB is attractive
as you don't have to serialize your data and commit to a format.).

You DO commit to a format, it just happens to have very efficient waysof handling data (in general)

Frankly I am extremely puzzled with what I perceive to be a hostile
look towards RDBMSes.  Some seem to be willing to jump through many
hoops to defend file-based approaches.  In fact I found conflicting
advice in the wiki itself:

Please do not confuse being careful with being against something, andplease do not use words as "hostile". Some of us have been working onlocalisation for quite a number of years, as well as in development,databases and development of standards. We understand the complete setof data that needs to be managed, something in which some of us havebeen working for quite a while. The DB vs. files approach has beendiscussed innumerable times, we are quite aware of the advantages ofdatabases.. and of their problems. Any argument that you might putforward has already been used internally, by people who have been usingDB for quite a while (some of us for 20 years). The use of files hasquite a number of advantages, and the project has followed this line ofdevelopment, which we question often, but never strong enough as toabandon current developments and change. The issue of scalabilityrequires that we look at the DB approach. Having said this, yourforceful approach on DB demands black/white answers (agree/do not agree)on a subject that for us is much more complex.

If there is change, it will not be tomorow. It requires clear planningand some security that the new approach is better, which we will onlyhave through experience. This is why I propose in my prior maildeveloping an experimental second DB based back-end (which we areprepared to fund), to ensure that all data can be easily mapped and thatit works better. If it comes out to be clearly better, we will be thefirst ones to go for it.

I am
feeling a little frustrated.  I sincerely want to help this project as
much as I can, and I feel that these fundamental issues must be resolved
before I get deep into the technical details.

The work that was planned is for your SoC project was very clear, anddoes not require any decision on the technology of the back-end. It willhelp the implementation of different approaches, but those approaches donot need to be decided now (even if work on figuring out if they arebetter can start immediatly).

You opinion on the back-end is important, as many others, but pleaseremember that there are other people involved, and that there arereasons why we do things the way we do them. At some point we might needto change the way things are made, but we need to be sure that we aremoving to a better approach. Opinions are not enough.


Javier

Reply to:

Follow-Ups:
- Re: [translate-pootle] [Fwd: Re: Wordforge]
  - From: Gintautas Miliauskas <gintas@akl.lt>

References:
- [Fwd: Re: [translate-pootle] Wordforge]
  - From: F Wolff <viool@webmail.co.za>
- Re: [Fwd: Re: [translate-pootle] Wordforge]
  - From: Gintautas Miliauskas <gintas@akl.lt>

Prev by Date: Re: Summary of Debconf i18n/l10n activities
Next by Date: Re: Translation of toppler
Previous by thread: Re: [translate-pootle] [Fwd: Re: Wordforge]
Next by thread: Re: [translate-pootle] [Fwd: Re: Wordforge]
Index(es):
- Date
- Thread