[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Promoting your website with bulk-email



* Mark Brown (broonie@sirena.org.uk) [20021029 18:35]:

> Another possibility: in my experience bogofilter seems to work
> better when it has seen very much more non-spam than spam
> e-mail.  As I recall your data set was about evenly split
> between the two.

Well, I just can't get enough ham. :-) Recently I did the same
test again with 50k spam and 15k ham.  Both SpamAssassin and
bogofilter were trained with the full spam corpus and full
ham corpus, then run against a set of 1387 previously unseen,
human-verified spam messages.  Result are:
 
bogofilter: 306 (22%) positives, 1081 (88%) false negatives
spamassassin: 580 (42%) positives, 807 (58%) false negatives

Training bogofilter with an additional 30k of ham the result
improved somewhat:

bogofilter: 631 (45%) positives, 756 (55%) false negatives

Training SpamAssassin with the same 30k additional ham failed with
OOM on a 256M RAM P4 machine.

>From the results, it is clear to me that Bayesian spam filtering
alone is still not good enough to catch most of spam.  If time
permits, I'll look into CRM114 and others.

Peter

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen Péter     /       \       /       \       /    fuji@debian.org
.+'         `+...+'         `+...+'         `+...+'         `+...+'



Reply to: