[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



Hi,

At Wed, 04 Jul 2001 11:31:45 +0300,
Shaul Karl <shaulka@bezeqint.net> wrote:

> 1) It is my opinion that only the Description field and the fields names 
> should have a translation. Does this sounds reasonable?

Yes, it would be nice if we have translation mechanism for Description
fields (titles and contents).

> 2) An important issue that should be agreed upon is the character encoding 
> scheme of these i18n files. If I remember correctly the `Introduction to i18n' 
> suggests that UTF-8 should be chosen.

There are three possibilities, I think.

1. Use locale-dependent encodings.  Since different encodings cannot
   stay in a single file, translations will have to be separated into
   different files.  So far almost translation-related things like
   man pages, info pages, message catalogs (aka gettext), debconf
   templates, and so on so on take this way.  I heard some translation
   mechanisms (old Gnome?) violate this rule (i.e., including different
   encodings in a file) and annoys translators (translations with different
   encodings are sometimes broken).
2. Use UTF-8, a universal encoding.  This enables translations to be
   included into one file.  However, encoding conversion from UTF-8
   to locale-dependent encodings will be needed by Description-handling
   softwares.  Fortunately, GNU libc (since version 2.2) supplies
   nl_langinfo() and iconv() for this purpose.
3. Use ISO-2022, an another unviersal encoding.  Like UTF-8, this will
   require encoding conversion.  (iconv(3) of GNU libc doesn't support
   ISO-2022.)

I think (2) is the best, as you wrote, since the advantage of (1) will
be decreasing in future because (1) assumes one fixed encoding for one
language (for example, ISO-8859-1 for French, EUC-JP for Japanese,
KOI8-R for Russian, ...).  And more, we might use UTF-8 for all languages
in future.  We are moving toward this direction, though I don't know
how many years we will need to complete this Migration to UTF-8.  (Many
mechanisms, like manpages and message catalogs, assume such "fixed
encoding for one language" and we will need great efforts and cooperation
with upstreams for this Migration.)

The demerit of (2) is that related softwares will have to implement
encoding conversion and that encoding conversion softwares sometimes
lack portability.

Portability problem is that softwares have to use nl_langinfo(CODESET)
and iconv().  iconv_open() has to accept conversion between UTF-8 and
locale encodings.  And more, the names for these encodings are not
standardized.  However, if we can limit portability to GNU libc system,
this is not a problem.  And, the cost of (2) that softwares will have
to implement encoding conversion is a limited problem because there
are a limited number of softwares which handle Description field.
(Imagine migrating man pages into UTF-8.  Unlike Description field,
there are many man pages which are written in non-ASCII encodings.
You will have to modify man parsers and browsers to assume UTF-8,
and you will have to convert ALL manpages into UTF-8 at the same time.
Otherwise your system will not work correctly.  If you think about
asking upstream to change manpages to be UTF-8, you will also have
to think about migration of encoding of man pages all over the world,
including proprietary OSes, at the same time!)

P.S. Thanks to refer my document. :-)

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: