[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SpamAssassin weightings - am I missing something?



Jonathan Matthews wrote:
> Quick Spamassassin question:
> 
> I've got SpamAssassin 2.43 installed, and it's working well.
> 
> However, I noticed the two lines quoted below in the altered body of 
> some spam that it caught recently:
> 
> SPAM: EXCUSE_16          (-0.3 points) BODY: I wonder how many emails they sent in error...
> SPAM: EXCUSE_14          (-0.2 points) BODY: Tells you how to stop further spam
> 
> It seems strange to me that these two reasons should /decrease/ the 
> probability of the email being spam.
> 
> I know that the weightings attached to different rules are 
> user-definable, so I'm not asking "how do I stop this behaviour" - I can 
> easily go and redefine the weights.
> 
> I'd just like to get some confirmation that these weightings are wrong.  
> It's the stock install of SpamAssassin in testing, with no alterations 
> made to the config at all.  Should I file a bug, change my own 
> weightings or go away in shame, having made a fool of myself publicly?

The deal is that spamassassin's scores are generated using a genetic
algorithm. They "breed" scores against a corpus of known spam and
non-spam, starting with random scores and mutating them up or down, then
seeing how that does and letting the winning mutations thrive. The aim
is to get as few false positives as possible while still catching as
much spam as possible of course. So the scores are not something
hand-tweaked by a human. 

What happens sometimes is it seems that making a score negative reduces
the number of false positives, while not catching any less spam, at
least in their body of spam. And the SA guys, rightly or wrongly, trust
their GA to get it right, and leave these negtive scores in. I have
mixed feelings about this, but it seems to work.

-- 
see shy jo

Attachment: pgp5fYmagAMgH.pgp
Description: PGP signature


Reply to: