Recently I am working on LC_CTYPE encoding (including UTF-8) support
in Debconf. In other words, I am trying to fix Bug#148490.
I'd like to report what I am doing and I have a few questions.
The project consists of four parts:
1. preparation of a way to know the encoding of gettext messages
and debconf templates.
2. conversion of gettext messages and debconf templates into current
3. new line-wrapping mechanism.
4. output support for each user interface (dialog/readline/editor/gnome).
For (1), I have asked encodings which are currently used for debconf
templates in each langugage in
http://bugs.debian.org/148490 also has a tentative list of languages
and encodings. Gettext .po files have their encoding information.
For (2), I wrote a small prototype script. It is available in
http://bugs.debian.org/148490 . On my machine, I have integrated
the script into debconf (version 1.1.14) and it works well.
For (3), we will have to think about the following things:
a. multibyte characters (in UTF-8, EUC-JP, EUC-KR, GB2312, Big5,...)
b. characters which occupy two or zero column (east Asian and combining
c. languages without work-separating whitespaces (Korean and Japanese)
I think I can write a line-breaking algorithm for Debconf, instead of
using Text::Wrap . However, I have not started yet. (I am afraid
I am reinventing the wheel... Are there any information on this field?)
I think it is difficult to achieve (4) now, because these user
interfaces have poor multibyte/doublewidth/combining/bidi/...
support so far. However, I think two interfaces of dialog and
readline could be in time for next release to Woody.
For 'readline' interface, we don't need to do anything. Though
'readline' doesn't support multibyte/doublewidth/combining/bidi
characters, I'd like to exclude *input* support of these characters
so far (i.e., *displaying* support only). Thus, besides poor line
breaking and non-converted gettext messages, my test version of
Debconf works well in UTF-8 locale (i.e., debconf templates are
converted from their own encodings into UTF-8 on-the-fly).
For 'dialog' interface, I found I have to use UTF-8 version of
newt/slang/whiptail. What make me happy is that, these packages
work well not only for UTF-8 but also for conventional 8bit
encodings and Asian popular multibyte encodings.
Here is my question: why non-UTF-8 versions of newt/slang/whiptail
exist? Are there any features which non-UTF-8 versions have but
UTF-8 versions don't have?
Sorry for long message; in short,
* I am working on Debconf i18n. Conversion of debconf templates
into LC_CTYPE locale's encoding seems to work well.
* Any information on i18n line-breaking subroutine or package
* Why non-UTF-8 versions of newt/slang/whiptail exist?
Tomohiro KUBOTA <firstname.lastname@example.org>
"Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/
To UNSUBSCRIBE, email to email@example.com
with a subject of "unsubscribe". Trouble? Contact firstname.lastname@example.org