[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Spamassassin



On Wednesday 08 October 2003 08:50, Stephan Seitz wrote:
> Is there a download possibility to get so much spam mails? Since I
> delete my spam, I don't have enough mails to train spamassassin.

It is, but I won't tell you because it wouldn't do you any good! :-)

For the Bayesian filter to be accurate, it has to be your spam and your 
ham. True, everybody's spam is quite similar, but there are 
differences, and it is not unlikely that it would make very skewed 
statistics, and would be of little use.

Another point to emphasize is that it is equally important to train it 
with your own ham as it is to train it with your own spam, and in 
roughly equal numbers.

So, I would strongly recommend you just manually save the spam to a 
folder for some time now, and build your spam database from there. 

Basically, what I did to train it, was to take about 2000 old spams I 
had gather a long time ago, when I was actively whacking spammers, then 
used 1000 recent spams from my old account (that took me just a week to 
gather ! :-( ). Then, on the top of that, I fed it with 250 spams from 
my new account. Nowadays, I feed it mainly with false negatives. 
The problem with this approach, is that the character of spam has 
changed substantially since I gathered those 2000, but it works 
apparently quite good for me anyway.   

Finally, I have a few spamtraps (hehe, spambots, do your worst: 
href="mailto:aa0u@kjernsmo.net";), which I intend to use to train it 
automatically, once I've got my Exim4 server configured right! 

Then, I took all the legitimate mail from the saved folders at my old 
account, and fed it to the learner as ham. Then, I took a lot of recent 
list mail and fed it too. Nowadays I reguralily supply it with most of 
that which lands in my inbox, that is, mail that is directed to me 
personally. 

It has made the Bayesian filter very accurate, but it has, as you can 
tell, taken a lot of effort. 

Cheers,

Kjetil
-- 
Kjetil Kjernsmo
Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
kjetil@kjernsmo.net  webmaster@skepsis.no  editor@learn-orienteering.org
Homepage: http://www.kjetil.kjernsmo.net/        OpenPGP KeyID: 6A6A0BBC



Reply to: