Re: Attempts to poison bayesian systems

On Tue, Dec 23, 2003 at 01:36:20PM +0000, Dale Amon wrote:
> > I have yet to see a false positive caused by this even though I get
> > quite a lot of this stuff and routinely mark it as spam.
> I can't think of any other reason for someone to do it
> though. There has to be a point. Someone is going to a 
> lot of trouble.

Could it be the case that they're using all these non-spam words to
generate false-negatives, thus bypassing bayesian filters?  I've seen
lots of these messages get through spamassassin in the past week or so,
all with very low bayes scores.  Training the bayesian classifier with
these messages is obviously not going to do me much good, because the
next spam is going to have a completely different set of tokens.

This method is especially effective in the case where the bayesian
classifier only looks at the first MIME attachment, because the second
is then free to contain whatever spam tokens they want to put in it.
IIRC, this is how most bayesian filters behave.


