on Thu, Aug 28, 2003 at 03:09:48PM +0200, Andreas Metzler (ametzler@downhill.at.eu.org) wrote:
> Karsten M. Self <kmself@ix.netcom.com> wrote:
> [...]
> > SpamAssassin achieves a false-positive rate (non-spam reported as spam)
> > of 5% with a default threshold of 5. This can be dramatically improved
> > using a whitelist, to ~98% in my experience. This is not the best
> > performance of all filters, so makes a somewhat generous threshold.
>
> > http://www.spamassassin.org/dist/rules/STATISTICS.txt
> > http://freshmeat.net/articles/view/964/
>
> > So a spam-reduction system user would at worst see a typical rate of 2%
> > of spam to be manually disposed of.
> [...]
>
> You are mixing up percentages. "5% non-spam reported as spam" ... can
> be ... improved to ~98% ...
Correct. And yes, I was thinking "false-negative". Spam not flagged as
spam.
What I meant to say was this:
- Currently feasible content-based filters + whitelists can achieve a
spam rate of 2% of spam passing to the inbox, by independent tests.
- A C-R system should then target having no more than 2% of challenges
sent be misdirected (based on spoofed headers, etc.). At this rate,
it's still transferring burden inappropriately, but at a level that
matches a reasonable-case technological alternative. This also
achieves a secondary goal in the interests of C-R proponents of
keeping the incidence of false challenges low enough that recipients
would be likely to respond to the challenge.
> When I last checked my personal rate with spamassassin 2.55 with
> default rules and no DNS lists or razor (but including a rather well
> trained bayesian filter) and a default threshold of 5, I came up with
> these numbers[1]:
> * 0% false positives, i.e. ham sorted into the spam folder
> * 10% of the spam was not recognized as such and I had to filter it
> out by hand.
I use a whitelisting system. It's based on Lars Wizenius's spamfilter
package, my local add being a shell script to scan messages for sender
to add to white, black, gray, or spam lists. Mail from previously
unknown senders ends up in a "grey" box. The principle is the same as
C-R, except that assessment is done by me, rather than a third party.
Peace.
--
Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/
What Part of "Gestalt" don't you understand?
Verio webhosting? Guaranteed downtime:
http://www.wired.com/news/politics/0,1283,57011,00.html
http://www.dowethics.com/r/environment/freedom.html
Attachment:
pgp5tKS6rPUFN.pgp
Description: PGP signature