[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Migrate website translations to PO [was: Re: When and how can we migrate out of CVS and WML ?]



Hello,
throwing my knowledge from the German translation in ...

On Wed, Aug 11, 2010 at 09:28:14AM -0400, David Prévot wrote:
> Le 06/08/2010 17:23, Gerfried Fuchs a écrit :
> > 	Hi!
> > 
> > * Andrei Popescu <andreimpopescu@gmail.com> [2010-08-05 09:17:59 CEST]:
> >> On Vi, 30 iul 10, 11:15:17, Andrei Popescu wrote:
> >>>
> >>> Moving to .po probably needs a coordinated effort including at least the 
> >>> coordinators from all the languages that have more than just a few 
> >>> translated pages.
> >>>
> >>> Is there some wiki page about this project? I can start one, but not 
> >>> until tomorrow.
> >>
> >> It took a bit longer, but the page is 
> >> http://wiki.debian.org/DebianWebsitePO

Additional pro: Some parts of the website (namely vote) have several
reocurring paragraphs, where I maintain a dedicated script to save the
trouble of retranslation. Unfortunately, nobody so far was interested in
converting it to po (cf. #364913). This will only help, if po is used
rightly, i.e. with variables (as the perl script I'm maintaining
essentially does by using regular expressions).

On the con side: Working with huge paragraphs with po is a pain,
especially with limited screen space. More below.

> >> Input from people familiar with po4a and (other) translators would be 
> >> highly needed.

:-))

> >  Often parts aren't properly in context, moved around within the po file
> > and get confusing when only working on the po file. If done carefully
> > this might be solved but it is something that shouldn't be ignored for
> > proper decision making.
> 
> I fail to understand the issue here : when working on a blank WML file,
> there is no context at all, the only context is in the original file,
> which needs to be used when translating a WML file, and can also be used
> when working on a PO file.

Blank wml file? I usually copy the english file to the German location
and then work on the english text, replacing it paragraph by paragraph
with the translation. I have all the context I need. Whats unclear for
me is if there will be one, one per directory, or thousands of po
files. The saved work could only be gained if there are as few as
possible po files (e.g. for vote, for News, for DWN, ...), but this
would be a hassle as well, as huge files would need to be moved back and forth
into CVS. Also if a team has several translators, they might "fight"
over a certain file, even if they work on different parts. Of course,
context could be problematic, and 

If there are however lots of small po files, then I somehow fail to
see the advantage (except for the rare case of moving paragraphs).
Review is easily done on updated translations using "cvs diff", both
on the original and the translation. Nothing else required.

> >  This is related to that po is for translating more-or-less text
> > snippets that are meant to be able to stand on their own. Having a text
> > seperated into multiple strings, with always the english part in between
> > does IMHO block some quality possibilities of having the text flow
> > naturally because it doesn't make the final proofreading as easy.
> 
> On the contrary, providing the original text while asking for review
> makes it easier for reviewer to understand what it is about (and
> eventually spot translation mistakes), without needing them to search
> for the ad-hoc part of the original text somewhere on the website.

Yes, this eases review in a certain way. I don't see text flow issues,
just that for large po files the (wanted) reuse of original text might
look like a cloze, so people might miss parts even though they are
there.

Btw. this depends how you do your review. Why not sending two files to
your reviewers? Or teach them CVS, so they can use "CVS diff" as well?
Also I would not let them search, I'd provide the link myself, if
needed be. So it really boils down to review methods and standards in
each team.

> >  Also, translating longer paragraphs gets annoying, especially when the
> > original gets changed. It will mark the string as fuzzy and the
> > translator has to dig around in a longer paragraph about what actually
> > has changed. One solution to this might be the --previous switch which
> > keeps the former string in there for comparison -- but are there
> > translation tools that support that properly and can hilight the changes
> > in a wdiff form? Maybe I missed some development in that area, feel free
> > to enlighten me. As long as such a tool isn't available I consider that
> > as a real issue.
> 
> It's one of the feature of Lokalize, don't know if it is implemented in
> other tools, but yes: Lokalize provide a colored diff inline between the
> old original text and the new one, and make it easy to spot what has
> been changed on the paragraph.

Well, the decision should not be based on a single tool. I use vim,
and I haven't checked it in Squeeze yet, but in Lenny I don't get such
a help. IMHO at least some tools should provide the help. Also I found 
CVS diff very helpful (much
better than working with po) as long as the original author did not
needlessly insert line breaks or reformat paragraphs. Of course,
simple moving of content (done for books, for example) would be easier
in po, as no manual intervention would be required.

> >  The last issue I see is with the the core way how po works: If it finds
> > an untranslated or fuzzy string it will put the english original into
> > the place. This might be something useful for applications to specificly
> > support work-in-progress approaches and not render a translation invalid
> > for a string that might only be an error message or such - but then I
> > don't consider this as an acceptable approach for the website. It would
> > be quite confusing for people to see a mix of english and their own
> > language on the same page and switch like from every paragraph to the
> > next. I *do* consider it better in that cases to have a potential
> > (minorly) outdated page but completely in the native language than a mix
> > of english and their language.

Well, this is the great *advantage* of po. Sorry. I worked on a long
page, then someone does several minor (yet nontrivial) changes, now my
page is out of date and "gone". If only one or two paragraphs would
appear in english, well, not fine, but the rest is still valid and
accessible. And this would also help in the discussion about removing
outdated translations - the would simply stay, but over time become
more and more english. Also right now, in the case prescribed, I need
to work on this file, even if I would have other priorities, because
either follow the original (closely) or loosing the entire
translation. With po, I could prioritize and say: "This paragraph is
currently not so important to have it translated, the main paragraphs are
up to date, so I first work on some other update/new translation and
revisite this file later."

> It might be possible to trick the usual PO workflow, by keeping the
> generated WML file in VCS, and update it if and only if the translator
> updates the PO file.

Then we loose the po advantage completely - please keep wml in this
case.

> Anyway, even if I understand the "please keep fully (even outdated)
> translated pages" argument, I don't think it applies to the whole site.
> For example, I think it would be better if developers' related stuff
> would be kept up to date, even if not the whole page is translated
> (rationale: developers needs to interact in English anyway, even if
> translated documentation is helpful, up to date documentation is more
> important).

Yes. And usually users might grok some english, so if it is only a
paragraph or two, they can manage. But if they get only english pages,
they turn away.

> >  Also, in some areas we do encourage adding language specific
> > information - I'm not too sure how that should work with po4a. Also, in
> > some specific situations it happened that translators have changed the
> > formating of a page (like seperating/merging two paragraphs) and it
> > might make sense for them to keep that possibility. Different languages
> > do have different representation requirements.
> 
> Actually po4a can handle addendum, which is a nice way to add some more
> information (it is often used to add translator credits for example).

Well, I think splitting paragraphs has nothing to do with addendums. I
don't see addendums for the website, currently at least. 

I just would point out that plain text files (like wml) are very
suitable for (CVS) diff, so minor differences can be easily spotted, while I
already spent quite some time comparing some fuzzy paragraphs (even
with --previous) for finding out the change - which often turned out
to be very minor. So unless good tool support is provided, I would be
cautious.

If, on the other hand the (english) original remains diffable, then
this might work, getting the best of both worlds. (Though, I also like
to diff other peoples translation once in a while, which I would
loose).

My main wish currently would be to get something more beefy as VCS. 
Performance is rather poor with CVS. Also, when using PO who will
update the po files? It needs to be done automatically, to get the
most up to date pages (mixing in english text where necessary), but
then each CVS update would become huge, if only for the updated "Last
updated" stanza. If, on the other hand, each translator updates the po
files herself (as currently), then we would loose the major advantage
of staying up to date even if some parts of the pages change (see
above). How is this planned?

And a final nitpick: Which translation teams do have the man power to
do the conversion? The German team currently works on moving the 
text based translations of man pages to po based ones, and this turns
out to be a huge effort. For the website, we at least know if a file
is up to date, but I guess still quite some effort is required (or we
hope that paragraph n in the original corresponds to paragraph n in
the translation and mass convert without review). 

So my POV: Moving to po is an option, but reading the web page cited
above still leaves many questions open before such a move can be
considered.

Greetings

           Helge
-- 
      Dr. Helge Kreutzmann                     debian@helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/

Attachment: signature.asc
Description: Digital signature


Reply to: