[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Effort to reduce translatable strings in debian-edu-doc manuals



Hi everyone,

While following up on the translation of debian-edu-doc manuals via
weblate, I sometimes notice that only trivial translation updates are
needed. This is the case, for example, when a new person contributed to a
translation, which changes the copyright information about that
translation, requiring all translations to be updated again.

I think it should be possible to reduce the amount of strings to be
translated by filtering out from the POT file as many non-translatable
strings as possible. This can be achieved via the po4a program, which is
already being used for building the debian-edu-doc manuals.  

What follows is a kind of a concept note to clarify what I am thinking
about. I took the audacity manual as an example, because it is very limited
in size.
What I propose is far from being ready to go into production. It still
needs further development. Also how exactly to integrate this into the
current building process of the PO and POT files and of the translated
documentation, has yet to be elaborated.

Thoughts and comments are very welcome.

Here goes the proposal.

***1*** about the actual po4a.cfg

The actual po4a.cfg looks like this:

[po_directory] .

[type: docbook] audacity-manual.xml \
	$lang:$lang.xml \
	opt:"-o untranslated='<listitem>' -M UTF-8 -k 15"

As far as I can tell the option -o untranslated='<listitem>'
has no effect on the audacity manual and could be dropped safely.

***2*** proposal for an updated p04a.cfg

I would propose to add a pot_in element to po4a.cfg
this would make po4a.cfg look like this.

[po_directory] .

[type: docbook] audacity-manual.xml \
    pot_in:audacity-manual_stripped.xml \
        $lang:$lang.xml \
        opt:"-M UTF-8 -k 15"

Explanation:
With po4a it is possible to hide some strings for translators.
For that purpose one can build a audacity-manual_stripped.xml (this is a
random name) that omits untranslatable strings and is used for
the creation of the pot and po files, while the original xml file
(audacity-manual.xml in this case) will be used to build the translated
documents.

***3*** an example script to create audacity-manual_stripped.xml
from audacity-manual.xml.

For each manual such a script will be different, because the paragraphs
that need to be kept out of the POT files will differ for each manual. Such
a script could be located in the scripts directory under debian-edu-
doc/documentation.
For the audacity manual I in fact propose two scripts: a dash script (the
main script) that also calls a sed script.
I hope the scripts are self explanatory.

** Below, firstly, the dash script (ugly named strip-aud-untrans.sh) **

#!/bin/sh

if [ $# -ne 2 ]
then
     echo "   You need to pass exactly 2 parameters."
     echo "   Usage:\n     ./strip-aud-untrans.sh audacity/audacity-
manual.xml audacity/audacity-manual_stripped.xml"
     exit
fi

# call sed script strip-aud.sed to strip some
# untranslatable paragraphs from audacity-manual.xml.
./strip-aud.sed $1 > $2

# reappend the last two lines to prevent xml to be brokenn.
sed -i '$s%^$%</section>\n</article>%' $2

** Below, secondly, the sed script (evenly ugly named strip-aud.sed) **

#!/bin/sed -f

# Usage: ./strip-aud.sed audacity/audacity-manual.xml > audacity/audacity-
manual_stripped.xml

# delete paragraphs that just have a <ulink>
# (no need to translate html links)
/^<para><ulink/,+1d

# delete paragraphs that just have a <inlinemediaobject>
# (no need to translate the name of inserted images)
/^<para><inlinemediaobject>/,+1d

# remove everything from <section id='18'> up to end of file
# this is the GNU GENERAL PUBLIC LICENSE
# (not to be translated according to the text in the manual)
# perhaps it is even better to also strip section 17)
/<section id='18'>/,$d

***4*** Discussion

The above proposal only strips a very reduced number of paragraphs.
More paragraphs could be stripped, provided a slightly different wording
of the audacity manual.

For example (this wording uses two paragraphs, which would make it possible
to strip the paragraph with the html link):

   There exists a frequently updated wiki version at:
   <ulink url="
http://wiki.debian.org/DebianEdu/Documentation/Manuals/Audacity"/>

instead of (the actual wording uses one paragraph):

The version at <ulink url="
http://wiki.debian.org/DebianEdu/Documentation/Manuals/Audacity"/> is a
wiki and updated frequently.


One also could reword the translation copyright paragraphs (using two
paragraphs):

    The Dutch translation is released under the GPL v2 or any later version
and is copyrighted by:
    Frans Spiesschaert (2014, 2018, 2019, 2020)

instead of (the actual wording in one paragraph):

The Dutch translation is copyrighted by Frans Spiesschaert (2014, 2018,
2019, 2020) and is released under the GPL v2 or any later version.

Doing so, one could keep the name of the translators out of the pot file.

***5*** End of the proposal. As already mentioned before, thoughts andcomments welcome. 

-- 
Kind regards,
Frans Spiesschaert



Reply to: