on Sat, Nov 15, 2003 at 06:05:03AM -0800, Steve Lamb (grey@dmiyu.org) wrote: > Karsten M. Self wrote: > >SA has an "autolearn" feature, where mail scoring above 6, and below > >0.1, will be "autolearned" as spam and ham. That is, the Baysian > >classifier will train on these mails. > > However these only are what SA would have caught already without the > Bayesian score. It discards that when autolearning to prevent a > self-spiralling corruption of its database. On any given pass through d-u, > d-d, d-m or d-k I can get 50-60+% messages which were not learned by SA. > That's a large amount to discard. This isn't my understanding. Remember that the training happens on both sides of the scoring -- both ham and spam are used to train. For words which frequently appear in both classes of mail, the predictive score will be low. Terms appearing with greater exclusivity in one or the other will have high absolute scores. Over time, you'll have fewere words which aren't predictive one way or the other, though some terms may not predict _much_. > Does he need to feed every message to SA? If he has autolearning > turned on, no. Should he feed samples in regularly? Yes. I'm not quite sure what you're saying here. My sense: the autolearning does training for you. Explicitly training on false positives/negatives corrects for miss-classified terms or those not properly scored. That should improve further accuracy, and be more-or-less sufficient. FWIW: scoring of a particular item of mail will change over time. I'll occasionally come across mis-classified spam in a folder (particularly one I don't read regularly), check its spam score as noted in headers (below threshhold), and then run 'spamc -c' to check the current score. Often it's now _over_ threshold. I attribute this to either automated or manual training of the Bayesian classifier. The differences are sometimes very marked -- headers note score of 2-3, spamc returns 8-16. Peace. -- Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/ What Part of "Gestalt" don't you understand? They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. - Benjamin Franklin, 1759
Attachment:
pgpPLnPx9mJ_D.pgp
Description: PGP signature