[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: I18N (Multibyte Enablation) of debconf


Thank you for checking my patch.

From: Joey Hess <joeyh@debian.org>
Subject: Re: I18N (Multibyte Enablation) of debconf
Date: Tue, 17 Jun 2003 12:17:16 -0400

> Your patch implements its own word wrapper. Did you consider using
> perl's UTF-8 support, and Text::Wrap?

There are several reasons why I didn't take that way.
 - Perl's own i18n depends on its own mapping tables which is
   contained in "perl" package (not in "perl-base" package).  The
   tables are relatively large and I didn't want debconf to require
   such large disk space for translation.
 - Since the current implementation of debconf does all internal
   processings in locale encodings (not in UTF-8), I just followed
   the policy.
 - Even if I were think the policy should be changed, I would not
   be able to modify debconf without making many bugs, because such
   change of policy would need much modification.

> Debconf does not currently let
> perl know what encoding is used by text, but if it did let perl know,
> then Text::Wrap should be able to get multibyte characters right.  I
> don't know if it would handle multi-column characters or low-density
> whitespace text.

Your anxiety is just right.  Text::Wrap cannot handle multi-column
characters nor low-density whitespace text even in UTF-8 mode.

> Anyway, I'd feel better if I knew you'd at least
> considered doing things this way. If perl is indeed lacking a word
> wraping module that is suitable for all languages, then it would be
> better to add such a module to perl, than just to debconf.

I think perl will have to have such an ability.  However, perl's own
i18n policy needs large disk space which is not very good for debconf.
Thus, even in future when perl would support all of (1) - (3) above,
it would need large disk space.  This is why I think it is better to
implement a debconf-specific lightweight mechanism than to implement
a universal mechanism which is compliant to perl's policy.

> Your Debconf::Wrap module should export the functions it provides. Code
> that calls an exported wrap() would then just work, unchanged.

I don't stick on implementation details.  Well, you may be right.

> It seems to me there must be a better way to expose wcwidth to perl than
> implementing your own with a table. Have you considered an XS library?

I have not considered this.  It may be a good idea but there may be
some problems.  (a) wcwidth() is based on wchar_t whose internal
expression is system- and locale-dependent.  If debconf wants to
be portable, it would be difficult to implement.  (There are no
portable way to convert from Unicode to wchar_t.)  (b)  Even if
debconf doesn't want to be portable and glibc (where wchar_t is
always UCS-4) will be the only system which debconf runs on, 
conversion between locale encodings and UCS-4 will be needed.
(c) current version of debconf doesn't contain any architecture-
dependent codes.  Usage of XS will break this situation or we will
have to use a separate package to keep main part of debconf

However, I understand your basic idea that it is better to purge out
the width table from debconf.  How do you think about the points of
(b) and (c)?  I.e., do you want debconf to be portable?  Do you want
debconf package to be architecture-independent?

> I haven't a chance of being able to maintain the code in Debconf::Wrap.
> If debconf does use it we will have to come up with some mechanism for
> you to maintain it, either a shared subversion repository or putting it
> in a separate package which debconf could use.

Either will be OK for me.  Which do you prefer?  (If a separate package
way is taken, the problem (c) above will be solved.  However, since
it might not be able to share $Debconf::Encoding::charmap, interface
might not be compatible to Text::Wrap.)

> I suspect that this word wrapper is a fair bit slower than Text::Wrap.

There may be rooms for optimizations, but it cannot be as fast as
the original version, because we have to abondon many asumptions
such as "length($string) is equal to the visible width (occupation
of columns) on terminal".

Tomohiro KUBOTA <kubota@debian.org>

Reply to: