[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: debconf w/ charset encoding support

Tomohiro KUBOTA wrote:
> I applied the following patch.  Then, 
> "LANG=ja_JP.UTF-8 dpkg-reconfigure debconf" with modified
> ("Description-ja:" -> "Description-ja.UTF-8:") templates in
> UTF-8 terminal worked partly well.  I don't know whether
> this is the right place for this modification or even I don't
> know whether this modification has some bad side-effects.
> Also, I think other programs than dpkg-reconfigure will need
> similar modification.

I think that binmode should probably be set in
/usr/share/perl5/Debconf/Templates.pm. Could you try it there?
If that doesn't work, it may need to be put in
/usr/share/perl5/Debconf/Db.pm, but I hope we can avoid that.

> By using this patch, almost displayed messages are good, but
> I found that the line-breaking algorithm must be improved to
> handle UTF-8 (and other encodings).  Text::Wrap cannot be used.

Of course perl itself supports unicode. It can even do encoding
conversion without the need of an add-on module such as Text::Iconv,
but I do not know if it uses its own internal tables; strace of
perl using its Encode::from_to() function does not show it calling
iconv() or gconv().

Anyway, since perl does support unicode, it knows about wide characters
and so on. Perl regular experssions also support unicode, and operate on
characters instead of bytes. Perl's unicode regex support is not
complete yet, and it may not support everything used in the Text::Wrap
regex. If it did though, Text::Wrap would in theory work better
with wide characters when presented with a string that perl knew the
encoding of.

Encode::decode() should be able to convert an string from a given
encoding into perl's own unicode encoding, and then Text::Wrap do its
thing, and then Encode::encode() used to convert it to the display
encoding. Whether this will work in practice I don't know. It seems
like a promising approach though.

see shy jo

Attachment: pgp6V_sljrPqk.pgp
Description: PGP signature

Reply to: