Re: Promoting your website with bulk-email
* Marcelo E. Magallon (mmagallo@debian.org) [20021015 07:25]:
> My problems with bogofilter come from the fact that most of the
> mail I get is written in English, with a small percentage (less
> than 5%, I'd dare say, in Spanish and German) and most of the
> SPAM I get nowadays is written in English, German and *gasp*
> Chinese (or Korean, or whatever). And some in Spanish, too.
I have similar experience, and additionally bogofilter data files
(BerkeleyDB) are just *huge*. I've trained it with a corpus of
25000 spam messages and 20000 non-spam messages, then I gave it a
test run on 200 previously unseen messages. 63% success rate, far
too low, and the databases exceeded 10M. SpamAssassin had 99.5%
percent success rate, but it is ridiculously slow compared to
Bayesian filters.
Peter
--
.+'''+. .+'''+. .+'''+. .+'''+. .+''
Kelemen Péter / \ / \ / fuji@debian.org
.+' `+...+' `+...+' `+...+' `+...+'
Reply to: