[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Promoting your website with bulk-email



* KELEMEN Peter (fuji@debian.org) [20030702 16:58]:

[ Please Cc: me on followup since I'm no longer subscribed thanks. ]

> From the results, it is clear to me that Bayesian spam filtering
> alone is still not good enough to catch most of spam.  If time
> permits, I'll look into CRM114 and others.

Since the infamous SpamAssassin/Osirusoft incident, I switched
over bogofilter (starting with 0.14.5.2, regularly updated until
0.16.2) with no preliminary training.  I chose to train it on my
regular mail inflow with the burden of having a lot of spam in the
couple of first days.  Well, I have to say I'm impressed.  Let the
numbers speak for themselves:

Sampling period:	2003/08/27 -- 2003/10/27 (8 weeks)
Total incoming mails:	37822 (100.00%)
Total incoming ham:	31706 ( 83.83%)
Total incoming spam:	6116  ( 16.17%)

Number of spam:		6116  (100.00%)
Spam caught:		5439  ( 88.93%)
False negatives:	677   ( 11.07%)
False positives:	3     (  0.50%)



Production period:	2003/10/27 -- 2004/01/14 (10 weeks)
Total incoming mails:	66967 (100.00%)
Total incoming ham:	55246 ( 82.50%)
Total incoming spam:	11721 ( 17.50%)

Number of spam:		11721 (100.00%)
Spam caught:		11163 ( 95.24%)
False negatives:	561   (  4.79%)
False positives:	3     (  0.03%)


This supports my "theory" that my bogofilter tests done before
(while still using SpamAssassin in production) was flawed because
I trained it with a lot of *old* spam, that skewed the values in
the wrong direction.

Peter (now a happy bogofilter user)

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen Péter     /       \       /       \       /    fuji@debian.org
.+'         `+...+'         `+...+'         `+...+'         `+...+'



Reply to: