Re: bayesian filter training question

To: debian-user@lists.debian.org
Subject: Re: bayesian filter training question
From: "Roberto C. Sanchez" <roberto@familiasanchez.net>
Date: Fri, 30 Sep 2005 07:36:05 -0400
Message-id: <[🔎] 20050930113605.GA24152@miami.familiasanchez.net>
Mail-followup-to: "Roberto C. Sanchez" <roberto@familiasanchez.net>, debian-user@lists.debian.org
In-reply-to: <[🔎] 200509300914.54021.kjetil@kjernsmo.net>
References: <[🔎] 20050929195116.GB27670@miami.familiasanchez.net> <[🔎] 200509300914.54021.kjetil@kjernsmo.net>

On Fri, Sep 30, 2005 at 09:14:53AM +0200, Kjetil Kjernsmo wrote:
> On torsdag 29 september 2005, 21:51, Roberto C. Sanchez wrote:
> > So, I finally decided to get with the 20th century and install
> > spamassassin (acutally spampd hooked through postfix) to do site-wide
> > spam filtering for my server. 
> 
> Yiiihaaa!
> 
> > My question is this.  As I am training 
> > it with sa-learn, is it (good|bad|indifferent) to train it on spam
> > that has already been flagged as spam.  That is, will this reinforce
> > spamassassin's notion of spam or ruin it?
> 
> No, that's fine. In fact, SA has this autowhitelist concept that does 
> exactly that (it's not really a whitelist, though, more an "evening out 
> weird things that may happen", I'm not using it). 
> 
> You should have a good look at bayes_ignore_header, so that it won't 
> train on things that are obviously in spam. SA is pretty good it this 
> itself, but if you see spam that has been filtered elsewhere a lot, be 
> sure to use it.
> 
> I'm guessing that you, like me, are doing this for your family. In that 
> case, I have found that it is quite sufficient to train a single 
> database with the spam and ham of the entire family. If you have more 
> diverse users, you would probably need to have a per-user 
> configuration. For example, a friend of mine has an uncle who is a 
> psychiatrist working with people with gambling obsessions, and SA was 
> pretty catastrophic for him until he got a per-user config.
> 
> Finally, I found that SA, in it's default 3.0-form was much too 
> conservative about the assigned scores, so I have a bunch of rules that 
> I have adjusted the score of. You'll get some experience about that in 
> time, I guess. Also note that SA 3.1 has been released upstream.
> 
Cool.  Thanks for the quick informative reply.

-Roberto
-- 
Roberto C. Sanchez
http://familiasanchez.net/~roberto

Attachment: pgpnf0V1Gm796.pgp
Description: PGP signature

Reply to:

References:
- bayesian filter training question
  - From: "Roberto C. Sanchez" <roberto@familiasanchez.net>
- Re: bayesian filter training question
  - From: Kjetil Kjernsmo <kjetil@kjernsmo.net>

Prev by Date: Re: permissions below /dev/ across reboots
Next by Date: Re: subversion 1.2.3a for sarge?
Previous by thread: Re: bayesian filter training question
Next by thread: Virüs Uyarisi! (Virus Notification!)
Index(es):
- Date
- Thread