Re: lists.debian.org de-localization
From: Tomohiro KUBOTA <email@example.com>
Subject: Re: lists.debian.org de-localization
Date: Tue, 07 Jan 2003 21:45:05 +0900 (JST)
> I think more important problem is how to deal with raw 8bit mail
> headers without encoding specification or encodings which are not
> supported by the current set-up but used in Debian mailing lists
> (GB2312, BIG5, and KOI8-R).
I heard that the current development version of MHonArc has a feature
to assume raw 8bit characters as some specified encoding . However,
I don't think this can be a solution now because it will take a very
long time that the version will be stable, then the stable version will
be adopted into unstable/testing version of Debian distribution, then
the distribution will become stable (released), and then the stable
distribution will be adopted to master.debian.org .
Anyway, I can write a KOI8-R -> SGML entity (or "&#xxxx;" expression)
filter very easily. My plan is to assume raw 8bit characters to be
KOI8-R Russian and I think this can be achieved easily.
Remained problem is: how to handle unsupported encodings such as
GB2312 and Big5. I found that the current set-up of lists.debian.org
mhonarc converts GB2312 and Big5 into raw 8bit streams (or can be said
16bit streams because these encodings are multibyte) and they also
cause encoding conflicts and loss of following "<" in "</em>". Thus
I'd like these encodings to be converted into "&#xxxx;" expressions.
(Also, debian-esperanto people may want to use ISO-8859-3 and UTF-8.)
but I don't think this will work well because it depends on
Unicode::MapUTF8 module which is available as libunicode-maputf8-perl
package since Woody, where master.debian.org is Potato.
Then, I might be able to write an original filter using libtext-unicode-perl
but the package is also available since Woody.
I don't know any other ways. Any suggestions?
Tomohiro KUBOTA <firstname.lastname@example.org>