Re: Promoting your website with bulk-email
* KELEMEN Peter (fuji@debian.org) [20030702 16:58]:
[ Please Cc: me on followup since I'm no longer subscribed thanks. ]
> From the results, it is clear to me that Bayesian spam filtering
> alone is still not good enough to catch most of spam. If time
> permits, I'll look into CRM114 and others.
Since the infamous SpamAssassin/Osirusoft incident, I switched
over bogofilter (starting with 0.14.5.2, regularly updated until
0.16.2) with no preliminary training. I chose to train it on my
regular mail inflow with the burden of having a lot of spam in the
couple of first days. Well, I have to say I'm impressed. Let the
numbers speak for themselves:
Sampling period: 2003/08/27 -- 2003/10/27 (8 weeks)
Total incoming mails: 37822 (100.00%)
Total incoming ham: 31706 ( 83.83%)
Total incoming spam: 6116 ( 16.17%)
Number of spam: 6116 (100.00%)
Spam caught: 5439 ( 88.93%)
False negatives: 677 ( 11.07%)
False positives: 3 ( 0.50%)
Production period: 2003/10/27 -- 2004/01/14 (10 weeks)
Total incoming mails: 66967 (100.00%)
Total incoming ham: 55246 ( 82.50%)
Total incoming spam: 11721 ( 17.50%)
Number of spam: 11721 (100.00%)
Spam caught: 11163 ( 95.24%)
False negatives: 561 ( 4.79%)
False positives: 3 ( 0.03%)
This supports my "theory" that my bogofilter tests done before
(while still using SpamAssassin in production) was flawed because
I trained it with a lot of *old* spam, that skewed the values in
the wrong direction.
Peter (now a happy bogofilter user)
--
.+'''+. .+'''+. .+'''+. .+'''+. .+''
Kelemen Péter / \ / \ / fuji@debian.org
.+' `+...+' `+...+' `+...+' `+...+'
Reply to: