Re: lists.debian.org de-localization
Tomohiro KUBOTA <firstname.lastname@example.org>:
> The key point is that when we receive a mail with raw 8bit characters,
> we don't have an easy and relyable method to tell the characters are
> from ISO-8859-1 or KOI8-R or other character sets.
If the headers contain 8-bit octets and are valid as UTF-8, it's
fairly safe to assume that they really are UTF-8. Otherwise, you could
look for a Content-Type field or make it depend on the mailing list.
> An easy way is to assume *all* raw 8bit characters to be KOI8-R and
> convert into SGML entity. However, I don't know whether there are
> some other languages where a certain amount of non-spammer people
> use raw 8bit characters. If they exist, they will complain on this
I thought some Japanese non-spammers use iso-2022-jp in headers, which
isn't 8-bit, but it isn't us-ascii, either. Am I out of date?