Re: Attempts to poison bayesian systems

To: Dale Amon <amon@vnl.com>, debian-isp@lists.debian.org
Cc: debian-security@lists.debian.org
Subject: Re: Attempts to poison bayesian systems
From: Russell Coker <russell@coker.com.au>
Date: Wed, 24 Dec 2003 00:52:45 +1100
Message-id: <[🔎] 200312240052.45792.russell@coker.com.au>
Reply-to: russell@coker.com.au
In-reply-to: <[🔎] 20031223132530.GA9089@vnl.com>
References: <[🔎] 20031223132530.GA9089@vnl.com>

This discussion has some minor relevance to debian-isp, but nothing to do with 
debian-security.  Let's move the discussion to debian-isp.

On Wed, 24 Dec 2003 00:25, Dale Amon <amon@vnl.com> wrote:
> I've been noticing loads of mails like this lately:
>
>   emery atrocious larval drippy elate incontrollable raster anglicanism
>   checkerberry feed sit ajar saturable decathlon
>   already climate inhibition pagoda narcissus expository toni
>
> I can only assume someone out there is trying to attack
> bayesian systems by loading them up with all sorts of
> normal words so that good mail gets false positives, thus
> breaking the systems.

I'm getting about 5-10 of those per day to my personal mailbox, and another 10 
or more through mailing lists.

I don't think it's an active attempt to poison bayesian systems, just an 
attempt to avoid them by making the ratio of spam-content to non-spam much 
lower.

One technique that's being used a lot is to get books in electronic form and 
put a coupld of sentences in every spam (sentences from a book will pass 
gramatical checking etc, unlike the example you posted above).  Also text 
from a book will have the right ratio of words, you will almost never find 
such a long "sentence" in an email message without a punctuation character, 
"and", "or", or other common words except in the case of source code (which 
is another category in bayesian filters).

I've never done anything serious with bayesian filters.  The machine that 
hosts my email has spamassasin doing something, but I've never investigated 
that (other people manage it).  I manage using DNSBL, iptables, and Postfix 
configuration for blocking spam.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page

Reply to:

Follow-Ups:
- Re: Attempts to poison bayesian systems
  - From: "Jason Lim" <maillist@jasonlim.com>

References:
- Attempts to poison bayesian systems
  - From: Dale Amon <amon@vnl.com>

Prev by Date: Re: Attempts to poison bayesian systems
Next by Date: Re: Attempts to poison bayesian systems
Previous by thread: Re: Attempts to poison bayesian systems
Next by thread: Re: Attempts to poison bayesian systems
Index(es):
- Date
- Thread