Hello, thanks for your letter, it helped me understand some of the motivation to preserve files, namely that the current system basically works with files as such. I am just not sure that this is a viable strategy in the long term. For some reason people don't pay attention at all to the usecase of online web translation. It may not be the primary goal, but I am sure that if we do it right it will be immensely popular. The hidden reasoning is that "it's not used now and generally things don't change". Well, such a thing does not exist, that's why it's not used. Look at Rosetta: they have huge flaws (no search on strings, slow uploads, outrageously complex .po download procedure, ...), and they are far from unpopular. For now I would like to concentrate on the backend. It is the base of the system and therefore important, even if it does not directly affect functional specifications. In the previous letter I put my important part (about the API) last and most people seemed to miss it and instead focused on RDBMSes. I do not really care about the actual implementation of the backend. The reason I am advocating an RDBMS is to influence the design of the backend. In the current situation, an API that just presents parsed files and writes objects back would do, but it would not be possible to have an efficient backend based on a database this way. Here's a single simple functional requirement that I propose for the backend: ============= It should be possible to write a separate program which should be able to do everything possible only using the backend object. No assumptions on files, no nothing, just manipulating objects. I would not have to care about locking, other servers running in parallel using the database, the OS, the filesystem or anything else. That also includes uploads of new languages / new projects. ============= Specifically, say I want an XML-RPC server. I just write a Python script that imports some classes from pootle.backend (or whatever), defines some functions and makes them public. It's a pretty basic requirement for any component: independence. Does your new design offer that? There is also a more technical question of the nature of the backend. It can be procedural (you call functions which return things), or it can be object-oriented: there are a few functions that return "basic" objects defined by interfaces rather than by implementation. Then you operate on those to get subobjects, to save the changes to the objects, etc. Which approach are you using? A few more notes: Trying to stick to a standard format (XLIFF, .po or anything else) for backend storage is not a good idea because it will be unnecessarily limiting. The standards have their own specifics which we may not care about, so there's the overhead of storing things the 'standard' way, even if it is not convenient. Eventually the format will simply not be enough. We will want to keep lots more metadata (e.g., string history, string submitter, date, ...). Storing that in external files, separated from the actual data, will be increasingly uncomfortable. The fear that files shalt be changed unnecessarily is unfounded. You already deserialize the files into objects and later serialize them back. There is no difference between storing the serialized form or the deserialized one in this regard. Anyway, from the engineering point of view, there is 'primary' data, and there are 'views' on that data. Do not confuse the two! XLIFF, .po or .html are just views. Data is just that, data, it has no connection to a format until you serialize it. (An RDB is attractive as you don't have to serialize your data and commit to a format.). The 'speed' argument is also not valid, because these 'views' are trivial to cache once the 'base' data and the 'derivative' data is separated. I maintain that RDBMS is the tool for the job here, and reinventing it will be painful. Gasper outlined a stack that could bring a few advantages of an RDBMS, but also significant complexity and fragility. E.g., I do not think NFS is a good choice here, it is by any means not transactional. And that would still not give us good random-access speed, or convenient network-transparency, or reliable transactions. Frankly I am extremely puzzled with what I perceive to be a hostile look towards RDBMSes. Some seem to be willing to jump through many hoops to defend file-based approaches. In fact I found conflicting advice in the wiki itself: http://translate.sourceforge.net/wiki/pootle/metadata advocates a DB http://translate.sourceforge.net/wiki/wordforge/file_system - files (BTW the last link also contains misinformation about migrating database versions as an argument against RDBMS. This can easily be done automatically. In addition it assumes that metadata can be extracted from data, which is (or will soon be) incorrect.) Phew. I just had to write this out to get it out of my head. I am feeling a little frustrated. I sincerely want to help this project as much as I can, and I feel that these fundamental issues must be resolved before I get deep into the technical details. Best regards, -- Gintautas Miliauskas http://gintasm.blogspot.com
Attachment:
signature.asc
Description: PGP signature