[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Effort to reduce translatable strings in debian-edu-doc manuals



Hi Frans,

thanks for sharing your thoughts.

On Tue, Jul 21, 2020 at 10:07:40PM +0200, Frans Spiesschaert wrote:
> While following up on the translation of debian-edu-doc manuals via
> weblate, I sometimes notice that only trivial translation updates are
> needed. This is the case, for example, when a new person contributed to a
> translation, which changes the copyright information about that
> translation, requiring all translations to be updated again.

Indeed, this is annoying.
 
> I think it should be possible to reduce the amount of strings to be
> translated by filtering out from the POT file as many non-translatable
> strings as possible. This can be achieved via the po4a program, which is
> already being used for building the debian-edu-doc manuals.  

While this is possible, it's not trivial at all.

> What follows is a kind of a concept note to clarify what I am thinking
> about. I took the audacity manual as an example, because it is very limited
> in size.
> What I propose is far from being ready to go into production. It still
> needs further development. Also how exactly to integrate this into the
> current building process of the PO and POT files and of the translated
> documentation, has yet to be elaborated.

Yes, it's supposed to be complicated.
 
> Here goes the proposal.
> 
> ***1*** about the actual po4a.cfg
> 
> The actual po4a.cfg looks like this:
> 
> [po_directory] .
> 
> [type: docbook] audacity-manual.xml \
> 	$lang:$lang.xml \
> 	opt:"-o untranslated='<listitem>' -M UTF-8 -k 15"
> 
> As far as I can tell the option -o untranslated='<listitem>'
> has no effect on the audacity manual and could be dropped safely.

Right; this stanza seems to stem from an effort (long ago) to exclude 
some strings from being translated.

> ***2*** proposal for an updated p04a.cfg
> 
> I would propose to add a pot_in element to po4a.cfg
> this would make po4a.cfg look like this.
> 
> [po_directory] .
> 
> [type: docbook] audacity-manual.xml \
>     pot_in:audacity-manual_stripped.xml \
>         $lang:$lang.xml \
>         opt:"-M UTF-8 -k 15"
> 
> Explanation:
> With po4a it is possible to hide some strings for translators.
> For that purpose one can build a audacity-manual_stripped.xml (this is a
> random name) that omits untranslatable strings and is used for
> the creation of the pot and po files, while the original xml file
> (audacity-manual.xml in this case) will be used to build the translated
> documents.

Right.
 
> ***3*** an example script to create audacity-manual_stripped.xml
> from audacity-manual.xml.
> 
> For each manual such a script will be different, because the paragraphs
> that need to be kept out of the POT files will differ for each manual.

The master XML file (in this case audacity-manual.xml) is generated via 
'make update' from the related AllInOne wiki page. This procedure is 
already far from being trivial, including various hacks.

> Such
> a script could be located in the scripts directory under debian-edu-
> doc/documentation.
> For the audacity manual I in fact propose two scripts: a dash script (the
> main script) that also calls a sed script.

> I hope the scripts are self explanatory.
> 
> ** Below, firstly, the dash script (ugly named strip-aud-untrans.sh) **
> 
> #!/bin/sh
> 
> if [ $# -ne 2 ]
> then
>      echo "   You need to pass exactly 2 parameters."
>      echo "   Usage:\n     ./strip-aud-untrans.sh audacity/audacity-
> manual.xml audacity/audacity-manual_stripped.xml"
>      exit
> fi
> 
> # call sed script strip-aud.sed to strip some
> # untranslatable paragraphs from audacity-manual.xml.
> ./strip-aud.sed $1 > $2
> 
> # reappend the last two lines to prevent xml to be brokenn.
> sed -i '$s%^$%</section>\n</article>%' $2
> 
> ** Below, secondly, the sed script (evenly ugly named strip-aud.sed) **
> 
> #!/bin/sed -f
> 
> # Usage: ./strip-aud.sed audacity/audacity-manual.xml > audacity/audacity-
> manual_stripped.xml
> 
> # delete paragraphs that just have a <ulink>
> # (no need to translate html links)
> /^<para><ulink/,+1d
> 
> # delete paragraphs that just have a <inlinemediaobject>
> # (no need to translate the name of inserted images)
> /^<para><inlinemediaobject>/,+1d

IMO this would imply to drop alt names for the images (most of them have 
one) which might not be wanted. (I tried to get rid of the image issue 
some time ago and refrained from doing so.)
 
> # remove everything from <section id='18'> up to end of file
> # this is the GNU GENERAL PUBLIC LICENSE
> # (not to be translated according to the text in the manual)
> # perhaps it is even better to also strip section 17)
> /<section id='18'>/,$d
> 
> ***4*** Discussion
> 
> The above proposal only strips a very reduced number of paragraphs.
> More paragraphs could be stripped, provided a slightly different wording
> of the audacity manual.
> 
> For example (this wording uses two paragraphs, which would make it possible
> to strip the paragraph with the html link):
> 
>    There exists a frequently updated wiki version at:
>    <ulink url="
> http://wiki.debian.org/DebianEdu/Documentation/Manuals/Audacity"/>
> 
> instead of (the actual wording uses one paragraph):
> 
> The version at <ulink url="
> http://wiki.debian.org/DebianEdu/Documentation/Manuals/Audacity"/> is a
> wiki and updated frequently.
> 
> 
> One also could reword the translation copyright paragraphs (using two
> paragraphs):
> 
>     The Dutch translation is released under the GPL v2 or any later version
> and is copyrighted by:
>     Frans Spiesschaert (2014, 2018, 2019, 2020)
> 
> instead of (the actual wording in one paragraph):
> 
> The Dutch translation is copyrighted by Frans Spiesschaert (2014, 2018,
> 2019, 2020) and is released under the GPL v2 or any later version.
> 
> Doing so, one could keep the name of the translators out of the pot file.
 
As far as the translators issue is concerned:

IMO this could be dropped completely. If looking at other comparable 
translations (like e.g. debian-reference), the debian/copyright file 
simply states:

I have asked all contributors including translators to 
license their work under the same copyright as I did (GPL).  List of 
translator names are not included here. See: 
https://salsa.debian.org/debian/debian-reference/-/blob/master/debian/copyright

Another example here:
No translators mentioned, see:
https://salsa.debian.org/hertzog/debian-handbook/-/blob/buster/master/debian/copyright

Or here:
https://salsa.debian.org/debian/developers-reference/-/blob/master/debian/copyright

And yet another possibility:
...and by other contibutors, see:
https://salsa.debian.org/installer-team/installation-guide/-/blob/master/debian/copyright

The translators could be credited, though - via an (optional) addendum file for a 
specific translation.

For the Bullseye manual translations, a related p4a.cfg file and 
sample {de,nl,nb} addendum files are attached. With these files present in 
'documentation/debian-edu-bullseye' of a local debian-edu-doc git clone, 
running 'LINGUA=nb make html' will generate a modified HTML manual file 
which can be opened locally. Click the first link on the page to see the 
translator credits. Same applies to (nl) and (de).

Wolfgang
[po_directory] .

[type: docbook] debian-edu-bullseye-manual.xml \
	$lang:$lang.xml \
	add_$lang:?./$lang.add \
	opt:"-M UTF-8 -k 15"
PO4A-HEADER: mode=after; position=Publikationsdatum; endboundary=<section>
<screen>
Autoren der Übersetzung:
  2007 Holger Levsen
  2007 Patrick Winnertz
  2007, 2009 Ralf Gesellensetter
  2007, 2008, 2009 Roland F. Teichert
  2007, 2009, 2011, 2014 Jürgen Leibner
  2008, 2010 Ludger Sicking
  2008 Kai Hatje
  2009 Kurt Gramlich
  2009 Franziska Teichert
  2009 Philipp Hübner
  2009, 2010 Andreas Mundt
  2012-2020 Wolfgang Schweer
</screen>
PO4A-HEADER: mode=after; position=Utgivelsesdato; endboundary=<section>
<screen>
Forfattere av oversettelse:
  2007, 2012, 2014-2020 Petter Reinholdtsen
  2007-2009 Håvard Korsvoll
  2008 Tore Skogly
  2010 Ole-Anders Andreassen
  2010 Jan Roar Rød
  2014, 2016, 2017 Ole-Erik Yrvin
  2014, 2015, 2016, 2017 Ingrid Yrvin
  2014 Hans Arthur Kielland Aanesen
  2014 Knut Yrvin
  2014 FourFire Le'bard
  2014 Stefan Mitchell-Lauridsen
  2014  Ragnar Wisløff
  2018-2020 Allan Nordhøy
</screen>
PO4A-HEADER: mode=after; position=Publicatiedatum; endboundary=<section>
<screen>
Auteur van vertaling:
  2014-2020 Frans Spiesschaert
</screen>

Attachment: signature.asc
Description: PGP signature


Reply to: