[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Wordforge, Pootle, GSoC, Debian



On Mon, 2006-05-29 at 05:03 +0200, Michael Bramer wrote:
> On Mon, May 29, 2006 at 07:18:33AM +0700, Javier SOLA wrote:
> > Michael Bramer wrote:
> > 
> > >I like to setup a pootle test server (cvs version) for the translation
> > >of the debian package descriptions. 
> > >We have >10000 pot-files for the debian package description. We will
> > >get >180000 po files with the 18 languages.  
> > Grisu,
> > 
> > This is a very interesting challenge.
> > 
> > So far we have thought of a hierarchical structure of files in Pootle. For example, We can have
> > 
> > Folder Debian
> >     Folder French
> >         Folder Debian Installer
> >              Folder level 1   ------ po files
> >                    .........
> >         Package descriptions
> >            .....
> >     Folder Khmer
> >          ....
> > Folder OpenOffice
> >    .....
> > 
> > Having 10,000 packages without structure (in the same folder)
> > requires a different type of interface, we cannot just list all the
> > files in a folder and show their percentage of translation, or
> > calculate on the fly the total percentage of translation in the
> > folder (search 10,000 files), we have to specify an interface that
> > will allow the users to go directly to a specific package, or to a
> > partial listing that contains that package. Or maybe I am wrong and
> > there is a structure?
> 
> yes, we need some structure. We use in Debian something like
> this in general (per Source-package):
>  a/
>  b/
>  c/
>  d/
>  ...
>  l/
>  liba/
>  libb/
>  libc/
>  ...
>  libz/
>  m/
>  ...
>  z/
> 
> see http://ftp.de.debian.org/debian/pool/main/ as example

We hadn't catered for that ie multilevel.  As we usually just have one
top level project directory.  We need to rethink that logic.

> is it a problem to put the po/xliff files in this structure and
> generate the stats per subdir and a overview?

Currently we create overview pages at all levels in the structure.  This
makes sense in for instance Mozilla where there are many subdirectories

eg

OOo/$language/avmedia/source/....

We create summaries at each level.

> maybe something like this:
> 
>  - Folder Debian
>     - Folder Debian Installer
>        - Folder level 1
>           - pot/po files (all languages)
>        - Folder level 2
>           - pot/po files (all languages)
>     - Package descriptions
>        - a
>           - a2ps-perl-ja 
> 	     - pot/po files (all languages)
> 	  - a2ps      
> 	     - pot/po files
> 	  - ...
>        - b
>        - ...
>     - NEWS
>        - a
>           - a2ps-perl-ja 
> 	  - ...
>     - ...
> 
> IMHO this is a the structure, that we should use for all package based
> translations in Debian.

Two issues that we would have to address or work around:

1) Multiple pre language directories eg
Debian/Debian_Installer/$language
2) Validation of GNU style translations layout (ie all languages in one
directory, we can manage this style of layout but we need to double
check it)

> maybe some counters:
>   - 10000 base source packges with >1 Description
>   -   800 packages with a NEWS.Debian files [1]
>   -   900 packages with debconf templates files [2]
>   - 58000 man pages [3]
>   -  3200 debian web pages [4]
>   -   800 normal po files [5]
>   -   ??? debian manuals/guides/...
>   -   ??? info pages
> 
> and we have 13-65 languages ... This is a lot of work and we will only
> start with some of this. But the new system should handle this work in
> the future.

We busy creating some performance metrics to test this volume and doing
some refactoring.  But these figures are useful for us testing the
performance issues.

> > Also, it might be interesting to wait until Pootle is working
> > internally with XLIFF files (next month) to test it, because then we
> > will have a very clear idea of the real performance, and this will
> > point very clearly at what needs to be improved (or modified). The
> > present version (0.9) uses PO internally. The new one will use XLIFF
> > internally, but will allow download and upload of PO files that are
> > generated on the fly from the XLIFF file, or merged into the XLIFF
> > files.
> 
> ok, nice.
> 
> When ist 'next month'?  1.6. oder 1.7.2006? :-)
> 
> Did you have a timeline?

We have thrown in some work on refactoring pootlefile.py together with
our move to allow direct XLIFF work.  So there will be a bit of delay.
Our plan is to have that done by the end of the first full week in June.
We are confident that there will be lots of bugs :) so testing is needed
before we are happy to run live translations on XLIFF.

> 
> 
> [1] $ zgrep -i  usr/share/doc.*/NEWS.Debian /var/cache/apt/ftp.de.debian.org_debian_dists_unstable_Contents-i386.gz  | wc -l
>      885    
> [2] $ grep Depends:.*debconf /var/lib/apt/lists/ftp.de.debian.org_debian_dists_unstable_main_binary-i386_Packages  | wc -l
>      908
> [3] $ zgrep man/man[123456789]/ /var/cache/apt/ftp.de.debian.org_debian_dists_unstable_Contents-i386.gz | wc -l
>      58844
> [4] http://www.debian.org/devel/website/stats/
> [5] $ w3m -cols 200 -dump http://www.debian.org/intl/l10n/po/fr | grep % | wc -l
>      853
> 
> Gruss
> Grisu
-- 
Dwayne Bailey
Translate.org.za

+27-12-460-1095 (w)
+27-83-443-7114 (cell)



Reply to: