[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Promoting your website with bulk-email



* Anthony DeRobertis (asd@suespammers.org) [20021028 12:10]:

> On Mon, 2002-10-28 at 11:58, Shawn McMahon wrote:
> > Sounds to me like you should be passing it through Bogofilter
> > first, and then through SpamAssassin.  63% of your spam would
> > then no longer be going through the slower process.

> Depends on what 63% means --- did it miss 37% of the spam, or
> did it falsely flag 37% of messages as spam? Or more likely, a
> combination of both.

Since all 200 of the test messages were human-verified spam, 63%
success rate means bogofilter classified 63% as spam and 37% as
non-spam (ie. I was just testing for false negatives).

Clint Adams was kind enough to test 600 of my spam messages with
his bogofilter databases he mentions elsewhere in this thread, he
got 87-88% success rate.  My databases were trained with ten times
as many spams and I got lower results.  Theoretically, I have two
explanations for this:

a) bogofilter is pretty much in flux and it evolved a lot since I
first tested it about two month ago.

b) bogofilter does not trim word lists as ifile does.  I didn't
look at the source, but judging fro mthe huge Berkeley DB files it
is the case.  Can someone confirm this?  If it is true, then we
have a classical over-training case observed with neural nets and
combined probability filters that degrades overall performance.

Peter

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen Péter     /       \       /       \       /    fuji@debian.org
.+'         `+...+'         `+...+'         `+...+'         `+...+'



Reply to: