[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: I18N (Multibyte Enablation) of debconf


From: barbier@linuxfr.org (Denis Barbier)
Subject: Re: I18N (Multibyte Enablation) of debconf
Date: Tue, 5 Aug 2003 23:58:53 +0200

> >    http://libtextwrap.sourceforge.net/
> Looks fine.  I would certainly be interested in discussing some issues,
> will you create a ML or should discussions belong here?

Ok, I will create.  However, I would like to create an initial
sample implementation before a mailing list otherwise the discussion
might be unrealistic.
(Until I will implement it, I won't understand what is realistic.)
However, I would be easily persuaded in this point.:-)  Any idea?

> Right, but there is another problem: when a translated text is
> retrieved (either via the debconf protocol or by gettext()),
> you do not know the language of this translated text.

My plan is to port my Text::WrapI18N module from Perl to C.
It doesn't use language information.  In other words, I will not
implement very rich algorithm.  Just depends on characters, not
on languages.

I think "support of languages without whitespace" can be implemented
roughly by depending on characters.  Like following:
  - CJK Ideogram, Hiragana, Katakana -> line-breakable characters
  - CJK Punctuations -> problematic
  - Whitespaces and Hyphen -> line-breakable characters
  - Others -> non-line-breakable characters

CJK Punctuations are problematic because they are commonly used for
Chinese, Japanese, and Korean.  While Chinese and Japanese don't use
whitespace between words (and lines can be folded at any point),
Korean uses whitespace between words.

My idea is that most of CJK Punctuations should be assigned as
non-line-breakable, because it doesn't affect C and J because
punctuations are not used as many as they disturb line-folding.

There are languages which need much difficult algorithm (like Thai)
but I think I will need help from native speakers.  Thai is difficult
because it doesn't use whitespace between words but line must be
folded between words.

I won't support hyphenation because I think it is too rich for basic
tty softwares.  Otherwise, my library would be as large as TeX.

Tomohiro KUBOTA <kubota@debian.org>

Reply to: