Re: Attempts to poison bayesian systems
This discussion has some minor relevance to debian-isp, but nothing to do with
debian-security. Let's move the discussion to debian-isp.
On Wed, 24 Dec 2003 00:25, Dale Amon <amon@vnl.com> wrote:
> I've been noticing loads of mails like this lately:
>
> emery atrocious larval drippy elate incontrollable raster anglicanism
> checkerberry feed sit ajar saturable decathlon
> already climate inhibition pagoda narcissus expository toni
>
> I can only assume someone out there is trying to attack
> bayesian systems by loading them up with all sorts of
> normal words so that good mail gets false positives, thus
> breaking the systems.
I'm getting about 5-10 of those per day to my personal mailbox, and another 10
or more through mailing lists.
I don't think it's an active attempt to poison bayesian systems, just an
attempt to avoid them by making the ratio of spam-content to non-spam much
lower.
One technique that's being used a lot is to get books in electronic form and
put a coupld of sentences in every spam (sentences from a book will pass
gramatical checking etc, unlike the example you posted above). Also text
from a book will have the right ratio of words, you will almost never find
such a long "sentence" in an email message without a punctuation character,
"and", "or", or other common words except in the case of source code (which
is another category in bayesian filters).
I've never done anything serious with bayesian filters. The machine that
hosts my email has spamassasin doing something, but I've never investigated
that (other people manage it). I manage using DNSBL, iptables, and Postfix
configuration for blocking spam.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
Reply to: