Hi all, I was interested in your opinions about compendia generation. Until now i got only two mails (out of this ML) and only one vote in prepared poll (http://slavino.sk/debian-kompendium). The poll is opened until 31. august 2012. Please, nobody want to post or vote your opinion in this? thanks Dňa Tue, 14 Aug 2012 10:03:35 +0200 Slavko <linux@slavino.sk> napísal: > Hi all, > > i start new thread for this to separate it, because it is separate > problem. > > Dňa Sun, 12 Aug 2012 17:30:38 +0200 Christian PERRIER > <bubulle@debian.org> napísal: > > > > What is your opinion, please? > > > > All these ideas seem to be good ideas. > > OK, now i am in state to setup my machine to allow locally generating > of the compendia, for testing purpose. It seems, that i understood the > script's ideas (i hope), but now there are some questions. > > There are some options how to generate the compendia and because i am > finding proper solution, i want ask all others what is best way. I see > these options: > > 1, Leave compendium at it is (One, as is) > ========================================= > > I will summary pro and pros of the actual state - it will be mixing > well know things and my opinions. > > Actually the script to generate the compendium takes all available PO > files for given language and creates (some conversion is here, if > needed - this i want to discuss separately) and merges them to one > big PO file by the msgcat tool, without any manipulation. > > (1)The result compendium contains information about: > * all input files, > * has comments from all input files > * has merged headers from all input files > * has all source files from all input files > > I consider all lines, which starts with "#" (except flags) as useless > for using to initialize a translation from scratch or to update an > already existing translation. > > (2)Apart of these information, final compendium contains merging of > all different translations of the messages, which results in a fuzzy > message, which contains info about all files, where this msgid exists > and all translation forms (surely all, the identical are here more > times). It contains all untranslated strings from all input files too. > > These fuzzy (except really outdated) messages are good indicators for > the difference in translations, once again, are useless for > initialize and update of the translations. > > As result (by-effect, but IMO important) of this merging of different > translations is, that a lot of translated messages are switched to > fuzzy. > > (3)Finally, it contains a lot of obsolete messages, beside previous i > consider it as can (not must) be useful for translators. Then i leave > this to the next discussion. > > I will take some statistics, to make some sense about amount of > mentioned information, which i did with the > compendium-sk-stamp20120810.po: > > size (B) % of orig > original 40 080 124 100 > without (1) 23 055 240 57,5 (a) > without (2) 25 931 817 64,7 (b) > without (1+2) 17 999 483 44,9 (c) > without (1+2+3) 14 967 094 37,4 (d) > > * The (a) has removed comments, references and contexts by sed and > stripped PO header manually (i really don't know how to clean it by > some tool) from original. > * The (b) was generated from original by "msgattrib --no-fuzzy". > * The (c) was generated from (a) by "msgattrib --no-fuzzy" > * The (d) last was generated from (a) by "msgattrib --translated > --no-fuzzy", but the same must be from (c) by "msgattrib > --translated". > > 2, Generate compendium with '--use-first' option (One, use first) > ================================================================= > > By this option, the msgcat will use any data only from their first > occurrence. The result compendium will has all comments, header > information and will preserve the translated status of the message, > but all only from first occurrence (file) and it is terrible to > define, which file will be first :-) > > Actually i cannot give the size of this file, because i have no full > translation material downloaded yet, but by my opinion, the result > will be about 80 % of the original size. > > Result compendium can be useful to initialize and to update > translations, but has one problem, caused by the "random" personality > of the "first occurrence" term and then some translated messages can > contain unwanted form of the translation. > > 3, Generate as is, but remove comments and fuzzy (One, > nocomments,no-fuzzy) > =========================================================================== > > By this option, will be compendium generated as it is, but after > generation will be stripped the comments and fuzzy messages. > > By this, the result size of the compendium will be cca 50 % of the > original (can be language depend), but result will be full usable to > initialize and to update translations, because it will contain only > translated messages. All different translations will be lost and will > not poison of the users :-) > > 4, Generate as is, but remove comments, fuzzy and obsolete (One, > no-fuzzy, no-obsolete) > ========================================================================== > > This option is similar to previous, but the obsoleted messages are > removed too. My knowledge is not enough to determine how are obsolete > messages useful for initialization and updating the translations, but > can make output lesser (cca another 10 %). > > 5, Generate two files - one some from previous options and one with > fuzzy messages (Two, some and fuzzy) > ========================================================================= > > This option is for tweaking translation, to provide two files. One > generated by some from previous options (will be selected latter) and > one, contains only fuzzy messages. Most of these fuzzy messages (i > hope) can be used to find of the translation differences and help > translation teams make their translations better. > > 6, Something other > ================== > > Have some other solutions/ideas, please, give it. > > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* > > Please, here are another views as mentioned yet. It is the server > (CPU, discs, etc) usage while generation and code change, here is i > see it: > > * option 1 is the simplest to implement and will not take another > server usage :-D > * option 2 is very simple to implement and IMO will not take another > server usage > * option 3 and 4 are simple to implement, but takes another server > usage > * option 5 will depends on finally selected solution, but will not > hard to implement, and seems that can take some another server usage > > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* > > Finally, I want to know, what is opinion of the other translators > (and non translators too), then i have prepared simple poll - the > options of this pool are marked as these in brackets after above > mentioned options. > > Please, give your opinion about this here: > http://slavino.sk/debian-kompendium, please it is my personal page and > then mostly in Slovak, i am sorry for this :-) > > regards > -- Slavko http://slavino.sk
Attachment:
signature.asc
Description: PGP signature