[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debconf i18n


Recently I am working on LC_CTYPE encoding (including UTF-8) support
in Debconf.  In other words, I am trying to fix Bug#148490.
I'd like to report what I am doing and I have a few questions.

The project consists of four parts:

   1. preparation of a way to know the encoding of gettext messages
      and debconf templates.

   2. conversion of gettext messages and debconf templates into current
      LC_CTYPE encoding.

   3. new line-wrapping mechanism.

   4. output support for each user interface (dialog/readline/editor/gnome).

For (1), I have asked encodings which are currently used for debconf
templates in each langugage in
http://lists.debian.org/debian-i18n/2002/debian-i18n-200207/msg00002.html .
http://bugs.debian.org/148490 also has a tentative list of languages
and encodings.  Gettext .po files have their encoding information.

For (2), I wrote a small prototype script.  It is available in
http://bugs.debian.org/148490 .  On my machine, I have integrated
the script into debconf (version 1.1.14) and it works well.

For (3), we will have to think about the following things:
   a. multibyte characters (in UTF-8, EUC-JP, EUC-KR, GB2312, Big5,...)
   b. characters which occupy two or zero column (east Asian and combining
   c. languages without work-separating whitespaces (Korean and Japanese)
I think I can write a line-breaking algorithm for Debconf, instead of
using Text::Wrap .  However, I have not started yet.  (I am afraid
I am reinventing the wheel...  Are there any information on this field?)

I think it is difficult to achieve (4) now, because these user
interfaces have poor multibyte/doublewidth/combining/bidi/...
support so far.  However, I think two interfaces of dialog and
readline could be in time for next release to Woody.

For 'readline' interface, we don't need to do anything.  Though
'readline' doesn't support multibyte/doublewidth/combining/bidi
characters, I'd like to exclude *input* support of these characters
so far (i.e., *displaying* support only).  Thus, besides poor line
breaking and non-converted gettext messages, my test version of
Debconf works well in UTF-8 locale (i.e., debconf templates are
converted from their own encodings into UTF-8 on-the-fly).

For 'dialog' interface, I found I have to use UTF-8 version of
newt/slang/whiptail.  What make me happy is that, these packages
work well not only for UTF-8 but also for conventional 8bit
encodings and Asian popular multibyte encodings.

Here is my question: why non-UTF-8 versions of newt/slang/whiptail
exist?  Are there any features which non-UTF-8 versions have but
UTF-8 versions don't have?

Sorry for long message; in short,

 * I am working on Debconf i18n.  Conversion of debconf templates
   into LC_CTYPE locale's encoding seems to work well.

 * Any information on i18n line-breaking subroutine or package 
   for Perl?

 * Why non-UTF-8 versions of newt/slang/whiptail exist?

Tomohiro KUBOTA <kubota@debian.org>
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/

To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: