Re: lists.debian.org de-localization
Hi,
From: Josip Rodin <joy@gkvk.hr>
Subject: Re: lists.debian.org de-localization
Date: Sun, 12 Jan 2003 04:14:45 +0100
> This, on the other hand, is a hassle to handle (backporting or installation
> into subdirs). master.d.o is scheduled to be upgraded to woody after samosa.
> That's all I know. <shrug>
This is a good news. Then I will work later on various encoding support.
Anyway, I don't expect the new master.d.o will have development version
of MHonArc (with encoding-assuming feature for raw 8bit headers) even if
it comes from non-Debian-package version. Thus I think we will have to
have some method to handle raw 8bit headers.
Here is a "filter" to convert 8bit characters (assumed to be KOI8-R) to
"&#xxxx;" expression, which I wrote by imitating iso8859.pl, CharEnt.pm,
and UTF8.pm . This filter is used for raw 7bit/8bit strings. Since
7bit part of KOI8-R is identical to ASCII, it doesn't harm legal ASCII
headers. The filter is to be installed into
org/lists.debian.org/mhonarc/share/mhonarc/MHonArc/DEBIAN.pm and doesn't
depend on the version of MHonArc or Debian.
## DEBIAN.pm by Tomohiro KUBOTA <kubota@debian.org>
##
## CHARSETCONVERTER module that assume input string to be KOI8-R
## and convert it into &#xxx; expression where xxx is decimal Unicode
## codepoint.
package DEBIAN;
%US_ASCII_To_Ent = (
#--------------------------------------------------------------------------
# Hex Code Entity Ref # ISO external entity and description
#--------------------------------------------------------------------------
0x22, """, # ISOnum : Quotation mark
0x26, "&", # ISOnum : Ampersand
0x3C, "<", # ISOnum : Less-than sign
0x3E, ">", # ISOnum : Greater-than sign
);
%KOI8_R_To_Ent = (
#--------------------------------------------------------------------------
# Hex Code Entity Ref # ISO external entity and description
#--------------------------------------------------------------------------
0x80, "─", # BOX DRAWINGS LIGHT HORIZONTAL
0x81, "│", # BOX DRAWINGS LIGHT VERTICAL
0x82, "┌", # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x83, "┐", # BOX DRAWINGS LIGHT DOWN AND LEFT
0x84, "└", # BOX DRAWINGS LIGHT UP AND RIGHT
0x85, "┘", # BOX DRAWINGS LIGHT UP AND LEFT
0x86, "├", # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x87, "┤", # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x88, "┬", # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x89, "┴", # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x8a, "┼", # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x8b, "▀", # UPPER HALF BLOCK
0x8c, "▄", # LOWER HALF BLOCK
0x8d, "█", # FULL BLOCK
0x8e, "▌", # LEFT HALF BLOCK
0x8f, "▐", # RIGHT HALF BLOCK
0x90, "░", # LIGHT SHADE
0x91, "▒", # MEDIUM SHADE
0x92, "▓", # DARK SHADE
0x93, "⌠", # TOP HALF INTEGRAL
0x94, "■", # BLACK SQUARE
0x95, "∙", # BULLET OPERATOR
0x96, "√", # SQUARE ROOT
0x97, "≈", # ALMOST EQUAL TO
0x98, "≤", # LESS-THAN OR EQUAL TO
0x99, "≥", # GREATER-THAN OR EQUAL TO
0x9a, " ", # NO-BREAK SPACE
0x9b, "⌡", # BOTTOM HALF INTEGRAL
0x9c, "°", # DEGREE SIGN
0x9d, "²", # SUPERSCRIPT TWO
0x9e, "·", # MIDDLE DOT
0x9f, "÷", # DIVISION SIGN
0xa0, "═", # BOX DRAWINGS DOUBLE HORIZONTAL
0xa1, "║", # BOX DRAWINGS DOUBLE VERTICAL
0xa2, "╒", # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0xa3, "ё", # CYRILLIC SMALL LETTER IO
0xa4, "╓", # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0xa5, "╔", # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0xa6, "╕", # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0xa7, "╖", # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0xa8, "╗", # BOX DRAWINGS DOUBLE DOWN AND LEFT
0xa9, "╘", # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0xaa, "╙", # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0xab, "╚", # BOX DRAWINGS DOUBLE UP AND RIGHT
0xac, "╛", # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0xad, "╜", # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0xae, "╝", # BOX DRAWINGS DOUBLE UP AND LEFT
0xaf, "╞", # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0xb0, "╟", # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0xb1, "╠", # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0xb2, "╡", # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0xb3, "Ё", # CYRILLIC CAPITAL LETTER IO
0xb4, "╢", # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0xb5, "╣", # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0xb6, "╤", # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0xb7, "╥", # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0xb8, "╦", # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0xb9, "╧", # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0xba, "╨", # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0xbb, "╩", # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0xbc, "╪", # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0xbd, "╫", # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0xbe, "╬", # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0xbf, "©", # COPYRIGHT SIGN
0xc0, "ю", # CYRILLIC SMALL LETTER YU
0xc1, "а", # CYRILLIC SMALL LETTER A
0xc2, "б", # CYRILLIC SMALL LETTER BE
0xc3, "ц", # CYRILLIC SMALL LETTER TSE
0xc4, "д", # CYRILLIC SMALL LETTER DE
0xc5, "е", # CYRILLIC SMALL LETTER IE
0xc6, "ф", # CYRILLIC SMALL LETTER EF
0xc7, "г", # CYRILLIC SMALL LETTER GHE
0xc8, "х", # CYRILLIC SMALL LETTER HA
0xc9, "и", # CYRILLIC SMALL LETTER I
0xca, "й", # CYRILLIC SMALL LETTER SHORT I
0xcb, "к", # CYRILLIC SMALL LETTER KA
0xcc, "л", # CYRILLIC SMALL LETTER EL
0xcd, "м", # CYRILLIC SMALL LETTER EM
0xce, "н", # CYRILLIC SMALL LETTER EN
0xcf, "о", # CYRILLIC SMALL LETTER O
0xd0, "п", # CYRILLIC SMALL LETTER PE
0xd1, "я", # CYRILLIC SMALL LETTER YA
0xd2, "р", # CYRILLIC SMALL LETTER ER
0xd3, "с", # CYRILLIC SMALL LETTER ES
0xd4, "т", # CYRILLIC SMALL LETTER TE
0xd5, "у", # CYRILLIC SMALL LETTER U
0xd6, "ж", # CYRILLIC SMALL LETTER ZHE
0xd7, "в", # CYRILLIC SMALL LETTER VE
0xd8, "ь", # CYRILLIC SMALL LETTER SOFT SIGN
0xd9, "ы", # CYRILLIC SMALL LETTER YERU
0xda, "з", # CYRILLIC SMALL LETTER ZE
0xdb, "ш", # CYRILLIC SMALL LETTER SHA
0xdc, "э", # CYRILLIC SMALL LETTER E
0xdd, "щ", # CYRILLIC SMALL LETTER SHCHA
0xde, "ч", # CYRILLIC SMALL LETTER CHE
0xdf, "ъ", # CYRILLIC SMALL LETTER HARD SIGN
0xe0, "Ю", # CYRILLIC CAPITAL LETTER YU
0xe1, "А", # CYRILLIC CAPITAL LETTER A
0xe2, "Б", # CYRILLIC CAPITAL LETTER BE
0xe3, "Ц", # CYRILLIC CAPITAL LETTER TSE
0xe4, "Д", # CYRILLIC CAPITAL LETTER DE
0xe5, "Е", # CYRILLIC CAPITAL LETTER IE
0xe6, "Ф", # CYRILLIC CAPITAL LETTER EF
0xe7, "Г", # CYRILLIC CAPITAL LETTER GHE
0xe8, "Х", # CYRILLIC CAPITAL LETTER HA
0xe9, "И", # CYRILLIC CAPITAL LETTER I
0xea, "Й", # CYRILLIC CAPITAL LETTER SHORT I
0xeb, "К", # CYRILLIC CAPITAL LETTER KA
0xec, "Л", # CYRILLIC CAPITAL LETTER EL
0xed, "М", # CYRILLIC CAPITAL LETTER EM
0xee, "Н", # CYRILLIC CAPITAL LETTER EN
0xef, "О", # CYRILLIC CAPITAL LETTER O
0xf0, "П", # CYRILLIC CAPITAL LETTER PE
0xf1, "Я", # CYRILLIC CAPITAL LETTER YA
0xf2, "Р", # CYRILLIC CAPITAL LETTER ER
0xf3, "С", # CYRILLIC CAPITAL LETTER ES
0xf4, "Т", # CYRILLIC CAPITAL LETTER TE
0xf5, "У", # CYRILLIC CAPITAL LETTER U
0xf6, "Ж", # CYRILLIC CAPITAL LETTER ZHE
0xf7, "В", # CYRILLIC CAPITAL LETTER VE
0xf8, "Ь", # CYRILLIC CAPITAL LETTER SOFT SIGN
0xf9, "Ы", # CYRILLIC CAPITAL LETTER YERU
0xfa, "З", # CYRILLIC CAPITAL LETTER ZE
0xfb, "Ш", # CYRILLIC CAPITAL LETTER SHA
0xfc, "Э", # CYRILLIC CAPITAL LETTER E
0xfd, "Щ", # CYRILLIC CAPITAL LETTER SHCHA
0xfe, "Ч", # CYRILLIC CAPITAL LETTER CHE
0xff, "Ъ", # CYRILLIC CAPITAL LETTER HARD SIGN
);
sub koi8r2sgml {
my $data = $_[0];
my ($len, $ret, $char, $offset);
$len = length($data); $ret = ""; $offset = 0;
while ($offset < $len) {
$char = unpack("C", substr($data, $offset++, 1));
if ($char < 128) {
$ret .= ($US_ASCII_To_Ent{$char} || pack("C", $char));
} else {
$ret .= ($KOI8_R_To_Ent{$char} || pack("C", $char));
}
}
$ret;
}
1;
--- debian.rc 2003-01-12 12:33:02.000000000 +0900
+++ debian.rc.new 2003-01-12 12:35:43.000000000 +0900
@@ -3,7 +3,7 @@
<!-- Common Resources -------------------------------------------------------->
<CharsetConverters>
-plain; mhonarc::htmlize;
+plain; MHonArc::DEBIAN::koi8r2sgml; MHonArc/DEBIAN.pm
us-ascii; mhonarc::htmlize;
iso-8859-1; iso_8859::str2sgml; iso8859.pl
iso-8859-2; iso_8859::str2sgml; iso8859.pl
Reply to: