[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Favorite anti-spam tool



On Wed, Apr 30, 2003 at 09:21:20PM -0400, Nori Heikkinen wrote:
> Cool, I just upgraded to 2.53, and it seems better.  Can someone
> explain to me how a Bayesian filter would work in a static context?
> Does it train itself based on the threshold you provide?  My
> impression (though this could be wrong, as I've only ever used
> spamassassin) is that on other commonly-used Bayesian spamfilters, you
> have to manually train it on x number of emails before it learns what
> you consider spam and what you don't.  How does spamassassin -- which
> is procmail-based -- train itself?

You can train it by hand using sa-learn. If you don't, it auto-learns
based on its own scores: anything below auto_learn_threshold_nonspam
(default -2.0) gets auto-learned as ham, and anything above
auto_learn_threshold_spam (default 15.0) gets auto-learned as spam. You
see 'autolearn=ham' or 'autolearn=spam' in the headers when this
happens, so you can correct it with sa-learn if need be.

This sounded like a dodgy approach to me, but in practice it seems to
work well. Most of the mail that SpamAssassin classifies wrongly is
somewhere between -2.0 and 15.0 and so doesn't get auto-learned either
way. And you can always turn off auto-learning if you really don't like
it.

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Reply to: