[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Spam in the lists out of control



On Tue, May 11, 2004 at 01:21:18PM -0700, Blars Blarson scribbled:
> In article <20040510122720.GA3461@beowulf.thanes.org> grendel@debian.org writes:
> >Both dspam and crm114 boast over 99%
> >accuracy in spotting spam, now that would be really neat if we had that
> >level of protection around here.
> 
> 99% isn't good enough.  That's about what the current spamassassin
> setup on lists.debain.org is doing, and all the complaints are about.
Quoting the DSPAM page:

--- QUOTE ---
DSPAM (as in De-Spam) is an extremely scalable, open-source statistical
hybrid anti-spam filter. While most commercial solutions only provide a mere
95% accuracy (1 error in 20), a majority of DSPAM users frequently see
between 99.95% (1 error in 2000) all the way up to 99.985% (1 error in
7000). DSPAM is currently effective as both a server-side agent for UNIX
email servers and a developer's library for mail clients, other anti-spam
tools, and similar projects requiring drop-in spam filtering. DSPAM has been
implemented on many large and small scale systems with the largest systems
being reported at about 125,000 mailboxes.
--- QUOTE ---

Quoting the crm114 page:

--- QUOTE ---
 I measured my own accuracy to be around 99.84%, by classifying the same set
of about 3000 messages twice over a period of about a week, reading each
message from the top until I feel "confident" of the message status, (one
message per screen unless I want more than one screen to decide on a
message.) and doing the classification in small batches with plenty of
breaks and other office tasks to avoid fatigue. Then I diff()ed the two
passes to generate a result. Assuming I never duplicate the same mistake, I,
as an unassisted human, under nearly optimal conditions, am 99.84%
accurate.).

Current filtering speed is about 120 kbyte/sec for a moderate (P-iii 1.4
GHz) mailserver.

Old News: New hackery gives us a tremendous speedup- we're now running about
4 times faster than SpamAssassin while still retaining our high ( better
than 99.9% ) accuracy after training. From Sep. 1 through 14 2003, I had
ZERO errors on over 2500 emails on my live incoming email stream.

Old News Flash: For the month of Nov 2002 : accuracy is now over 99.9% on my
live incoming email mix. That's 5849 messages, 1931 spam and 3914 nonspam,
and only 4 spams got through.
--- QUOTE ---

And, no, I haven't tested them myself (yet) but I have no reason not to
believe what the authors of the programs say about their performance and
accuracy.
 
> If you can't show less than 0.1% false negatives and 0.01% false
> positivies, it isn't worth bothering to try switching.
Even if you don't have to spend $$$ for new hardware? For me it is a benefit
if you upgrade some software, keep the cash, and achieve at least the same
efficiency/accuracy. But I might be wrong, of course

regards,

marek

Attachment: signature.asc
Description: Digital signature


Reply to: