Re: Compendia generation

To: debian-i18n@lists.debian.org
Subject: Re: Compendia generation
From: Slavko <linux@slavino.sk>
Date: Tue, 28 Aug 2012 21:52:33 +0200
Message-id: <[🔎] 20120828215233.0daff280@bonifac.skk>
In-reply-to: <[🔎] 20120814100335.00370503@bonifac.skk>
References: <[🔎] 20120814100335.00370503@bonifac.skk>

Hi all,

I was interested in your opinions about compendia generation. Until now
i got only two mails (out of this ML) and only one vote in prepared
poll (http://slavino.sk/debian-kompendium). The poll is opened until
31. august 2012.

Please, nobody want to post or vote your opinion in this?

thanks

Dňa Tue, 14 Aug 2012 10:03:35 +0200 Slavko <linux@slavino.sk> napísal:

> Hi all,
> 
> i start new thread for this to separate it, because it is separate
> problem.
> 
> Dňa Sun, 12 Aug 2012 17:30:38 +0200 Christian PERRIER
> <bubulle@debian.org> napísal:
> 
> > > What is your opinion, please?  
> > 
> > All these ideas seem to be good ideas.
> 
> OK, now i am in state to setup my machine to allow locally generating
> of the compendia, for testing purpose. It seems, that i understood the
> script's ideas (i hope), but now there are some questions.
> 
> There are some options how to generate the compendia and because i am
> finding proper solution, i want ask all others what is best way. I see
> these options:
> 
> 1, Leave compendium at it is (One, as is)
> =========================================
> 
> I will summary pro and pros of the actual state - it will be mixing
> well know things and my opinions.
> 
> Actually the script to generate the compendium takes all available PO
> files for given language and creates (some conversion is here, if
> needed - this i want to discuss separately) and merges them to one
> big PO file by the msgcat tool, without any manipulation.
> 
> (1)The result compendium contains information about:
>  * all input files,
>  * has comments from all input files
>  * has merged headers from all input files
>  * has all source files from all input files
> 
> I consider all lines, which starts with "#" (except flags) as useless
> for using to initialize a translation from scratch or to update an
> already existing translation. 
> 
> (2)Apart of these information, final compendium contains merging of
> all different translations of the messages, which results in a fuzzy
> message, which contains info about all files, where this msgid exists
> and all translation forms (surely all, the identical are here more
> times). It contains all untranslated strings from all input files too.
> 
> These fuzzy (except really outdated) messages are good indicators for
> the difference in translations, once again, are useless for
> initialize and update of the translations.
> 
> As result (by-effect, but IMO important) of this merging of different
> translations is, that a lot of translated messages are switched to
> fuzzy.
> 
> (3)Finally, it contains a lot of obsolete messages, beside previous i
> consider it as can (not must) be useful for translators. Then i leave
> this to the next discussion.
> 
> I will take some statistics, to make some sense about amount of
> mentioned information, which i did with the
> compendium-sk-stamp20120810.po:
> 
> 			size (B)	% of orig
> original		40 080 124	100
> without (1)		23 055 240	57,5	(a)
> without (2)		25 931 817	64,7	(b)
> without (1+2)		17 999 483	44,9	(c)
> without (1+2+3)		14 967 094	37,4	(d)
> 
>  * The (a) has removed comments, references and contexts by sed and
>    stripped PO header manually (i really don't know how to clean it by
>    some tool) from original.
>  * The (b) was generated from original by "msgattrib --no-fuzzy".
>  * The (c) was generated from (a) by "msgattrib --no-fuzzy"
>  * The (d) last was generated from (a) by "msgattrib --translated
>    --no-fuzzy", but the same must be from (c) by "msgattrib
> --translated".
> 
> 2, Generate compendium with '--use-first' option (One, use first)
> =================================================================
> 
> By this option, the msgcat will use any data only from their first
> occurrence. The result compendium will has all comments, header
> information and will preserve the translated status of the message,
> but all only from first occurrence (file) and it is terrible to
> define, which file will be first :-)
> 
> Actually i cannot give the size of this file, because i have no full
> translation material downloaded yet, but by my opinion, the result
> will be about 80 % of the original size.
> 
> Result compendium can be useful to initialize and to update
> translations, but has one problem, caused by the "random" personality
> of the "first occurrence" term and then some translated messages can
> contain unwanted form of the translation.
> 
> 3, Generate as is, but remove comments and fuzzy (One,
> nocomments,no-fuzzy)
> ===========================================================================
> 
> By this option, will be compendium generated as it is, but after
> generation will be stripped the comments and fuzzy messages.
> 
> By this, the result size of the compendium will be cca 50 % of the
> original (can be language depend), but result will be full usable to
> initialize and to update translations, because it will contain only
> translated messages. All different translations will be lost and will
> not poison of the users :-)
> 
> 4, Generate as is, but remove comments, fuzzy and obsolete (One,
> no-fuzzy, no-obsolete)
> ==========================================================================
> 
> This option is similar to previous, but the obsoleted messages are
> removed too. My knowledge is not enough to determine how are obsolete
> messages useful for initialization and updating the translations, but
> can make output lesser (cca another 10 %).
> 
> 5, Generate two files - one some from previous options and one with
> fuzzy messages (Two, some and fuzzy)
> =========================================================================
> 
> This option is for tweaking translation, to provide two files. One
> generated by some from previous options (will be selected latter) and
> one, contains only fuzzy messages. Most of these fuzzy messages (i
> hope) can be used to find of the translation differences and help
> translation teams make their translations better.
> 
> 6, Something other
> ==================
> 
> Have some other solutions/ideas, please, give it.
> 
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> 
> Please, here are another views as mentioned yet. It is the server
> (CPU, discs, etc) usage while generation and code change, here is i
> see it:
> 
>  * option 1 is the simplest to implement and will not take another
> server usage :-D
>  * option 2 is very simple to implement and IMO will not take another
>    server usage
>  * option 3 and 4 are simple to implement, but takes another server
> usage
>  * option 5 will depends on finally selected solution, but will not
> hard to implement, and seems that can take some another server usage
> 
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
> 
> Finally, I want to know, what is opinion of the other translators
> (and non translators too), then i have prepared simple poll - the
> options of this pool are marked as these in brackets after above
> mentioned options.
> 
> Please, give your opinion about this here:
> http://slavino.sk/debian-kompendium, please it is my personal page and
> then mostly in Slovak, i am sorry for this :-)
> 
> regards
> 



-- 
Slavko
http://slavino.sk

Attachment: signature.asc
Description: PGP signature

Reply to:

Follow-Ups:
- Re: Compendia generation
  - From: Christian PERRIER <bubulle@debian.org>

References:
- Compendia generation
  - From: Slavko <linux@slavino.sk>

Prev by Date: Bug#686037: Out of date debconf translation
Next by Date: Re: Compendia generation
Previous by thread: Compendia generation
Next by thread: Re: Compendia generation
Index(es):
- Date
- Thread