On Sat, 30 Aug 2003 23:18:59 -0400 Tom Allison <tallison@tacocat.net> wrote: > From what I'm seeing in the logs and in the docs is that Bayesian filtering > is enabled by default. But it is not used until there is (IIRC) 200 emails > of both spam and ham built into the database. > Did I miss something? How does it get to that 200 mark of ham and spam? While autolearning is turned on by default with a threshold of -2 for ham and +15 for spam this just reinforces the default SA rules. IE, anything SA would have let through anyway it now gets a negative scoring on the Bayesian filtering. Anything SA would have rejected gets a higher scoring thanks to Bayesian. Furthermore the autolearn thresholds discount the Bayesian modifier. So if a piece of mail scores 15 exactly before Bayes kicks in it won't be autolearned. sa-learn lets you feed messages to the Bayesian filter to learn from. Bayesian doesn't need to be active for it to learn. This lets you get the filter trained a bit faster. Furthermore it will help adjust on messages that SA would miss as either ham or spam. The bounces and virus messages being a prime example. Ever since they were coming in I've been feeding them to the filter. Now most are rejected at SMTP time even though the SA team has not released another version with updated filters to address those types of messages. I don't sent every message to the filters. I let autolearn do its job. But I do make it a point every now and again to feed it 20-30 messages from random lists and my inbox. About 1/2 were not autolearned so it keeps my filters fresh. I do feed all spam into the filters to ensure that side is definitely kept up to date. -- Steve C. Lamb | I'm your priest, I'm your shrink, I'm your PGP Key: 8B6E99C5 | main connection to the switchboard of souls. -------------------------------+---------------------------------------------
Attachment:
pgpRMJ7iztpoq.pgp
Description: PGP signature