Re: Google summer of code: i18n infrastructure

To: debian-i18n@lists.debian.org
Subject: Re: Google summer of code: i18n infrastructure
From: Gintautas Miliauskas <gintas@akl.lt>
Date: Fri, 12 May 2006 15:40:50 +0300
Message-id: <[🔎] 20060512154050.6da3aaad@localhost.localdomain>
In-reply-to: <[🔎] 200605120915.49203.zejn@owca.info>
References: <20060502223605.0xja4gkgowcwgow4@posta.owca.info> <[🔎] 20060511061953.GA19876@djedefre.onera> <[🔎] 20060511134312.5344999e@localhost.localdomain> <[🔎] 200605120915.49203.zejn@owca.info>

Hello,

> One thing the debian people were sceptical about when evaluating
> pootle as a possible candidate was its high ram usage. How is this
> expected to be with zope? Would integration to plone be possible?

I have also heard bad rumours about pootle's resource eating habits,
but I have not looked into it closer yet.

As for Zope, I proposed Zope 3 as the framework, which is different in
many ways from Zope 2 (in short, Zope 2 is more CMS-ish, and Zope 3 is
more specialized-webapp-ish). Plone runs on Zope 2, so I'm afraid that
direct integration is not possible, but I don't see a big problem here.

Zope 3 is not very light either, but I think that its performance should
scale well enough for this project.

> I took a bit closer look at the pootle code and found they were using
> minidom for xml parsing, which is supposed to be quite memory hungry
> and slow. Instead of minidom I think it would be smarter to use
> ElementTree, which has a C implementation and a part of it is also
> going to be included in Python 2.5. PO file parsing could be done in
> pure python or in with a help of C library, I was thinking about
> python-mxtexttools package. For spellcheck there's python-enchant (in
> unstable) and aspell-python which I dont think is in Debian yet. 
> 
> So there's a lot of things just waiting to get used.

Agreed.  My point is to be very careful about "basing" the new system
on some existing software package.

> One thing I am not sure about and probably needs more discussion is
> how would the storage get implemented; would it be a separate svn
> repository or just an API bridge to upstream repositories that would
> also implement format check and QC and so on. 

I am still not completely sure about the storage backend myself.  Zope
has ZODB, the Zope Object Database, which is pretty nifty but I'm not
sure if it will scale well enough.  Another posibility is to use a
relational database (PostgreSQL or MySQL) with or without an
object-relational mapper.  I have very little experience in this area
(I've only been using Zope with ZODB), so I'll have to look into this
further.

> One key thing to have in mind here is to create a
> single point of translation exchange, but not make it web interface
> only.

That was the original idea.  The first step in my plan doesn't even
involve a Rosetta-like interface at all, it's just about
translating, reviewing and submitting localized files.

> This portal would have a svn repository for storage backend instead
> of bridge as this would also be alternative way of accessing
> translations for more technically knowledgeable and/or offline
> translators.

I think that it is a bit too early to consider such low-level details.
In particular, I would prefer a relational DB to a filesystem backend,
much less Subversion.  There is going to be a hell of a lot of data
inside, and relational databases could buy us performance by free
indexing, fast searches and cross-references, not to mention
transactions, better network transparency, replication, etc.

> This is basically what my proposal for building Debian's i18n
> infrastructure was.

Can you make your complete proposal public too?  Maybe there are some
important ideas that I have not considered.  As I mentioned, I see it
as my goal to collect all the different ideas floating around ;)
Thanks.

I appreciate your input very much.

Best regards,
-- 
Gintautas Miliauskas
http://gintasm.blogspot.com

Attachment: signature.asc
Description: PGP signature

Reply to:

Follow-Ups:
- Re: Google summer of code: i18n infrastructure
  - From: Zejn Gasper <zejn@kiberpipa.org>

References:
- Google summer of code: i18n infrastructure
  - From: Christian Perrier <bubulle@debian.org>
- Re: Google summer of code: i18n infrastructure
  - From: Gintautas Miliauskas <gintas@akl.lt>
- Re: Google summer of code: i18n infrastructure
  - From: Zejn Gasper <zejn@owca.info>

Prev by Date: Re: Google summer of code: i18n infrastructure
Next by Date: Re: Google summer of code: i18n infrastructure
Previous by thread: Re: Google summer of code: i18n infrastructure
Next by thread: Re: Google summer of code: i18n infrastructure
Index(es):
- Date
- Thread