[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

(New) manual translators: please use proper encoding



Hello,

I've seen your submissions for the Spanish and German manual translations.
Very good, but...

You should all check that you are using the proper encoding. The default 
encoding for xml documents is UTF-8.
NOTE: if you do not use proper encoding, your translation will not 'build'!

Currently Wolfgang and David are uploading documents that are encoded in 
ISO-8859-1. You can do this, but then you _have_ to explicitly set the 
encoding for the document (as explained below).
Bruno has submitted documents that have html-codes (like é), which will 
work but makes translating and editing a lot harder and is not necessary.

The best is to choose between the following two options:
1. Use UTF-8 encoding (preferred).
   This means you will have to use an editor that can handle UTF-8.
   An example for this is the Greek (el) translation.
2. Use some other encoding (ISO-8859-1 should work for German and I think
   also for Spanish).
   In that case you _have_ to include the following as the first line in
   _every_ translated document:
      <?xml version="1.0" encoding="ISO-8859-1"?>
   Examples for this is are the French (fr) and Dutch (nl) translations.
   (Note: watch out for /administrativa/contributors.xml; it is best to
    leave this document completely in UTF-8!)

Example (from the German translation):
<snip>
Wir freuen uns, dass Sie sich entschieden haben, Debian zu probieren,
und sind sicher, dass Sie die GNU/Linux Distribution von Debian
einzigartig finden. &debian; bringt qualitativ hochwertige freie
Software zusammen und bildet daraus ein zusammenhängendes Ganzes.
</snip>

When the manual is build, this results in the following error:
<snip>
../de/welcome/welcome.xml:10: parser error : Input is not proper UTF-8, 
indicate encoding !
Software zusammen und bildet daraus ein zusammenhängendes Ganzes.
                                                 ^
../de/welcome/welcome.xml:10: error: Bytes: 0xE4 0x6E 0x67 0x65
Software zusammen und bildet daraus ein zusammenhängendes Ganzes.
                                                 ^
</snip>

In UTF-8, the same line will look like this (if you use an editor without 
setting it to UTF-8 encoding):
... und bildet daraus ein zusammenhängendes Ganzes.

Note: if you choose UTF-8, you can convert documents that are now in ISO 
encoding using the 'iconv' command.


One other thing.
To make checking for changes in the original English documents easier, please 
replace the following lines:
   <!-- retain these comments for translator revision tracking -->
   <!-- $Id: welcome.xml 12756 2004-04-06 22:23:56Z fjpop-guest $ -->
with the following line
   <!-- original version: 12756 -->
in every document that you translate.
Note: the number in the new line should be the same as the revision number 
(after the file name) in the second old line.

Cheers,

FJP



Reply to: