[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

SpamAssassin - how about bogofilter?



Earlier today I enabled SpamAssassin filtering for all mail sent to the BTS.
Procmail and SpamAssassin have recently been installed into owner@bugs mail,
were we have monitored their progress and hit rate.

I was just wondering, if anyone's tried a statistical approach (e.g.
bogofilter) to filter spam? I've used spamassassin for half a year,
and the spam writers were just too creative for spamassassin's rules
to keep up. A few mailing announcement lists I was on regularly became
spam (which they weren't since I'd asked for them) and some spam
regularly got through the filter.

I switched to bogofilter a month ago, and trained it with my corpus of
around 1000 spam messages and several thousand ham (good) messages. No
false positives and only a few negatives (mainly announcements that
were in my spam corpus used to train bogofilter). I'm running each
e-mail through the filter and add stats about each to ham or spam
dictionaries, as appropriate, so bogofilter learns to be better all
the time.

One major advantage of bogofilter (in addition to being more accurate
than spamassassin) is that it's blazingly fast. Even though I have a
high-end computer and a fast net connection, spamassassin really eats
up cycles when e-mails start coming in in larger groups. bogofilter
hasn't peaked yet.

I understand that the statistical (bayesian-like) approach is really
suited for personal filtering and can't be used in an ISP's mail hub,
but I suppose that the mails that come into debian lists are uniform
enough (even more so that most people's regular mails) to make this
approach very effective.

So I was just wondering if this has been discussed or tried out?

--
Tarmo Toikkanen
     - NP Solutions
     - tarmo@iki.fi
     - http://www.iki.fi/tarmo/



Reply to: