[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Attempts to poison bayesian systems

On Tue, Dec 23, 2003 at 12:00:43PM -0500, Noah L. Meyerhans wrote:
> On Tue, Dec 23, 2003 at 01:36:20PM +0000, Dale Amon wrote:
> > > I have yet to see a false positive caused by this even though I get
> > > quite a lot of this stuff and routinely mark it as spam.
> > I can't think of any other reason for someone to do it
> > though. There has to be a point. Someone is going to a 
> > lot of trouble.
> Could it be the case that they're using all these non-spam words to
> generate false-negatives, thus bypassing bayesian filters?  I've seen
> lots of these messages get through spamassassin in the past week or
> so, all with very low bayes scores.  Training the bayesian classifier
> with these messages is obviously not going to do me much good, because
> the next spam is going to have a completely different set of tokens.

If you don't train it, it won't improve and for instance bogofilter
doesn't take into account tokens which haven't occured very often or
which occur equally on spam and ham sides.

I don't see what you will lose from training unless *all* spammers use
the words which are common in your ham but not common in spam.  They'd
have to be very clever to pick out the right words for everyone and by
the time they'd used them a few times and we'd trained things like
bogofilter would be ignoring them as statistically irrelevant and not a
good indicator for instance.

I mention bogofilter a lot because when I dropped spam assassin it
didn't do bayesian filtering so I've only used bogofilter for that.

> This method is especially effective in the case where the bayesian
> classifier only looks at the first MIME attachment, because the second
> is then free to contain whatever spam tokens they want to put in it.
> IIRC, this is how most bayesian filters behave.

Oh?  I don't believe bogofilter does that.  What Bayesian filters only
look at the first attachment?

Simon  [ huggie@earth.li ] *\             "The claw is our master."  \**
****** ]-+-+-+-+-+-+-+-+-[ **\                                        \*
****** [  Htag.pl 0.0.22 ] ***\                                        \

Attachment: pgp7Vio66WFnS.pgp
Description: PGP signature

Reply to: