[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Compendia generation



Hi,

Dňa Wed, 29 Aug 2012 06:57:58 +0200 Christian PERRIER
<bubulle@debian.org> napísal:

> > Please, nobody want to post or vote your opinion in this?
> 
> Hmmm, sorry, your mail came in a moment where I had other priorities,
> then I forgot to put it in my TODO list.

No problem, we all have a lot of another work ;-)

> After quickly going through your mail again (thanks for the *detailed*
> explanations, I know by epxerience how much time it takes to write
> such summary mails....and how frustrating it is to not get much
> feedback..:-)

It takes even more time with my poor and bad English ;-)

> How about generating *all* kinds of compendia? This way, you don't
> have to choose: users have..:-). Size and/or CPU is not a big problem:
> that's what servers are built for..:-)

When CPU/disk usage si not (big) problem, then all is possible. It
seems to be simple job.

I have downloaded mostly all translation material (i download 10 dirs
in depth only) in my local machine and tried it, to take some
experiences about execution time. The original script takes cca 56 min
in my  machine, and when i add the sed (to remove comments) and
msgattrib (to grep out fuzzy messages), it takes 65 min to execute for
all languages.

The another point is disk size, mentioned sed and msgattrib (and
--use-first for msgcat) gives output cca 50 % smaller than original
(from 2.152.583 kB to the 1.046.024 kB for all languages).

Without measure (only visual checking the htop output) it seems, that
most of time is taken by find utility, to find all language's PO files
and do some UTF-8 conversions and checks. But this will be executed
only once for language.

It seems to be no problem to generate all mentioned types. Basically,
there is needing only generate two types of the compendium (one as is
and one with --use-first). All others can be generated from these two
(some sed and msgattrib magic) and then it will be relative quick
job (for both - me and server).

There is another question. When removing fuzzy there are two solutions:

1, remove fuzzy from original compendium - it removes all different and
outdated strings
2, remove fuzzy from --use-first compendium - it removes only outdated
strings
3, of course make both :-)

> As a user of compendia (I download them daily and use them in
> Lokalize), I have no strong opinion about which flavour would be the
> most useful for me and I would probably need to try each of
> them.....so the best option seems to just generate all of them..:-)

I am not using it, because it is mostly unusable for our language due a
lot of differences in translations. Can you, please, share your
experiences how useful are translator and extracted comments ("# " &
"#.") and references ("#:") from compendia?

Essentially, there is not necessary to make all changes at once. We can
split it to more steps and as first step we can generate mentioned two
essential compendia - one as is and one with --use-first option and
then we can see what to do next. The change is trivial and i did quick
attempt with this right now without any problem (only for one language
to save some time). The another invocation of the msgcat with some
cleaning changes execution from 57 sec to 1 min 3 sec (one language).

The output dir now seems (will be wrapped):

  91 aug 29 08:54 compendium-nofuzzy-sk-LATEST.po -> compendium-nofuzzy-sk-stamp20120829.po
 17M aug 29 08:54 compendium-nofuzzy-sk-stamp20120829.po 
  83 aug 29 08:54 compendium-sk-LATEST.po -> compendium-sk-stamp20120829.po
 37M aug 29 08:54 compendium-sk-stamp20120829.po
171K aug 29 08:54 20120829.log

Here will be needed more checks later, mostly about cleaning the old
files, but it seems as no problem.

regards

-- 
Slavko
http://slavino.sk

Attachment: signature.asc
Description: PGP signature


Reply to: