On Sat, 30 Aug 2003 23:40:13 -0400
Tom Allison <tallison@tacocat.net> wrote:
> It may be turned on in the config files, but I am guessing that the code is
> skipping the bayesian score contribution until the mail count gets to 200 on
> each side (ham/spam).
Right.
> I just grabbed a lot of email I had already and fed it into the sa-learn.
> I think I have enough now that it is working.
You can tell by looking at the headers and seeing if BAYES_xx shows up.
The xx is the approx. range that the Bayesian filter places the particular
piece of mail. For example here's the score from the message of yours I am
responding to:
X-Spam-Status: No, hits=-3.6 required=5.0
tests=BAYES_10,NO_REAL_NAME
version=2.55
So the Bayesian filter (classifier?) thinks it is 10-??% (forget the upper
range) likely to be spam. Ah, here it is. From 23_bayes.cf...
body BAYES_10 eval:check_bayes('0.10', '0.20')
...10 to 20% which gives it a score of...
score BAYES_10 0 0 -5.300 -4.701
...-4.701 based on my setup. IIRC first score is if no network checks are
enabled, second score is if network checks are enabled. Well, let's see.
NO_REAL_NAME nets the message...
score NO_REAL_NAME 0.993 0.820 1.137 1.149
...1.149. -4.7 + 1.1 = -3.6
> I'm not sure, I just kind of fiddled with it a few times in the early hours
> and got it working.
Yeah, it just takes a little bit to kick in. Once it does the difference
is dramatic if you track the scores. Average ham for me is around -3 and
average spam is closer to 12 to 15. Affords me a lot of latitude when
configuring sa-exim to reject things at SMTP.
--
Steve C. Lamb | I'm your priest, I'm your shrink, I'm your
PGP Key: 8B6E99C5 | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------
Attachment:
pgpJ4cQzaXaXO.pgp
Description: PGP signature