Re: lists.debian.org de-localization (Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages)
</lurk>
>>>>> "Marco" == Marco d'Itri <md@Linux.IT> writes:
Marco> It would be *MUCH* better to just refuse these
Marco> messages. Most of them are spam anyway. At least in my
Marco> country (and in all western europe, I think) raw latin-1
Marco> characters in headers are never found outside of non-spam
Marco> messages.
He did say "Russian." On xemacs-users-ru, which is dedicated to
Russian-language posts, about half the users use RFC-2047 encoded-words,
and the rest are split evenly between ASCII-only and 8-bit Cyrillic.
"Raw Cyrillic in headers" is used by some of the more sophisticated
users, too, surprisingly enough.
This is a fairly small sample (about 100 subscribers, 25 regular
posters). However, the Russian spam I've seen (isn't it funny how you
can identify spam even though you can't read the language it's written
in?) invariably fails either the addressee tests (implicit, too many),
the known spam software test, or the HTML-only test. So (FWIW) I've
disabled the 8-bit test and so far the Russian subscribers are happy.
I will also say I've seen a fair amount of dumbquotes from MS-encumbered
posters, and the occasional accented Latin character from French and
German posters (although those are quite rare, but not quite nonexistent).
Marco> /^Subject: .*[^[:print:]]{8}/ REJECT Your mailer is not \
Marco> RFC 2047 compliant
If you're going to do that, 8 is probably too many (SPC is not an
8-bit character---I find 3 works well) and the reason should be
failure to comply with RFC 2822. AFAIK 2047 does not prohibit 8-bit
characters, it simply provides a mechanism to encode them in
environments where they are prohibited.
<lurk>
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
Reply to: