[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Spam in the lists out of control

On Sun, May 09, 2004 at 11:57:17PM -0400, Duncan Findlay scribbled:
> On Mon, May 10, 2004 at 04:09:33AM +0200, Marek Habersack wrote:
> > On Sun, May 09, 2004 at 06:44:36PM +0200, Eike zyro Sauer scribbled:
> > > Andrew Lau schrieb:
> > > > Has debian.org's Spamassassin Bayesian database been poisoned? If so,
> > > > would flushing the database at random intervals be enough to keep its
> > > > usefulness feasible or would it just let too spam in after each flush?
> > > 
> > > I'd "donate" 6000 spam mails, if this helps.
> > I could add my 14845 spams, too :)
> Pfff... you can have my 63,286 spams if you really want, but it won't
> really help you. The thing with a Bayesian database is that the mail
> it's trained on needs to be similar to the mail it will be tested
> against.
Most of my spam comes from the debian lists, so I would say it is similar
enough to the traffic down here.
> For what it's worth, empirical evidence indicates that SpamAssassin's
> Bayesian database is difficult to poison, since it's difficult for
> spammers to pick words that are learned as non-spammy (since everyone
> has their own set of non-spammy words). But, since lists.debian.org
> doesn't use bayes, this point is moot.
I don't understand why is SpamAssassin thought to be the only option? SA is
a CPU/memory hog, it can easily kill even a fairly powerful machine and
there _are_ alternatives to it. One thing to use could be dspam, as I
pointed at in the other post, another (which also uses language
classification and
is already packaged for debian) would be crm114 and then there is a whole
host of bayesian filter programs that are written in a language suited for
heavy-duty tasks (C, that is :>). Both dspam and crm114 boast over 99%
accuracy in spotting spam, now that would be really neat if we had that
level of protection around here.



Attachment: signature.asc
Description: Digital signature

Reply to: