[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reject non-english mail as spam?



On Tue, Jan 13, 2004 at 10:10:08PM +0200, Micha Feigin wrote:
> On Tue, Jan 13, 2004 at 01:03:30AM -0700, Lucas Albers wrote:
> > I keep getting spam on the list that is completelly foreign.
> > 
> > SA scores it as this in regards the foreign langauge component:
> > 1.5 BODY_8BITS BODY: Body includes 8 consecutive 8-bit characters
> >  2.8 UNWANTED_LANGUAGE_BODY BODY: Message written in an undesired language
> >  3.2 CHARSET_FARAWAY BODY: Character set indicates a foreign language
> >  3.2 CHARSET_FARAWAY_HEADER A foreign language charset used in headers
> >  2.5 MIME_CHARSET_FARAWAY   MIME character set indicates foreign language
> > 
> > 
> > Can we tune the sa rules for this list to reject completelly non-english
> > email?

This is non-trivial. The current plan is to just have some procmail
rules to deal with this, which can be set on a per-list basis, instead
of having different Spam Assassin configurations for different lists.

As you can imagine, those rules would create mayhem on lists such as
debian-chinese-big5.

> > Or can it be assumed that people will be posting non-english email to this
> > list.

Not really. This list is supposed to be in english.

> This is not my expertise, but how will english emails sent from
> computers of people using utf8 (possibly setup to handle another
> language) produce with this test?

If they should be fine. I picked a list at random the other day and did 
some quick analysis of what was getting blocked on murphy, and what 
was going through. I posted the results (so far) on:

http://www.redellipse.net/stuff/2004/01/12#2004011201

Of the 98 messages which did make it onto debian-devel, 68 would have
could have been blocked if we had a test on the GB2312 charset. I'm
assuming there will be similar results on other lists at the moment.

	Cheers,

Pasc
:wq



Reply to: