Re: Attempts to poison bayesian systems

To: russell@coker.com.au
Cc: "Jason Lim" <maillist@jasonlim.com>, debian-isp@lists.debian.org
Subject: Re: Attempts to poison bayesian systems
From: "Lucas Albers" <albersl@cs.montana.edu>
Date: Tue, 23 Dec 2003 20:52:54 -0700 (MST)
Message-id: <[🔎] 2583.216.166.133.165.1072237974.squirrel@www.cs.montana.edu>
Reply-to: albersl+debianAZF8744@cs.montana.edu
In-reply-to: <[🔎] 200312240200.19321.russell@coker.com.au>
References: <20031223132530.GA9089@vnl.com> <[🔎] 200312240052.45792.russell@coker.com.au> <[🔎] 55d001c3c960$306eff90$de00a8c0@tw1> <[🔎] 200312240200.19321.russell@coker.com.au>

Russell Coker said:
> Also it makes it slightly more difficult for good filters to catch the
> spam,
> but at the cost of making the spam less effective.
>
> Guys who will get their credit card out when reading a clear message
> offering
> to double their penis size probably won't do so if the penis message is
> mixed
> in with Shakespeare...

I have played around a LOT with spamassassing filtering, and you can
signifigantly raise your SA score by adding in extra checks.
Custom CF rules, look at the evilrules (search google) for great rules on
catching garbage html obfuscation.
I have a 10,000 line custom local cf file.

Razor+Pyzor+Dcc Checks: Adding in network checks will raise your score.
Use the newest version of SA, it has signifigantly increased the bayes score.
Stateful analysis of normal message traffic.
I use mimedefang+sendmail+SA which allows me to do a more thorough
analysis of message traffic considering all the components of the traffic,
compared to just SA.

Look up greylisting for some more ideas on blocking email.
I also use greylisting, and it has cut my spam volume, and mail server
utilization as I can reject before content analysis.

To defeat bayes poisoning you need to determine if bayes poisoning has
occurred by analyzing letter frequency in the first 400bytes and last 400
bytes of the email. From sa developers comments.

You can determine the normal letter frequency, which letters come before
and after each other, and how often. Then you can detect when this is not
occuring which indicates bayes poisoning
I believe this is a future feature of SA.

-- 
--Luke CS Sysadmin, Montana State University-Bozeman

Reply to:

References:
- Re: Attempts to poison bayesian systems
  - From: Russell Coker <russell@coker.com.au>
- Re: Attempts to poison bayesian systems
  - From: "Jason Lim" <maillist@jasonlim.com>
- Re: Attempts to poison bayesian systems
  - From: Russell Coker <russell@coker.com.au>

Prev by Date: Re: txucom.com
Next by Date: gary@Computer-Essence.com
Previous by thread: Re: Attempts to poison bayesian systems
Next by thread: txucom.com
Index(es):
- Date
- Thread