Re: Spam in the lists out of control

To: debian-devel@lists.debian.org, Eike zyro Sauer <eikes@cs.tu-berlin.de>
Subject: Re: Spam in the lists out of control
From: Marek Habersack <grendel@debian.org>
Date: Mon, 10 May 2004 14:27:20 +0200
Message-id: <[🔎] 20040510122720.GA3461@beowulf.thanes.org>
Mail-followup-to: debian-devel@lists.debian.org, Eike zyro Sauer <eikes@cs.tu-berlin.de>
Reply-to: grendel@debian.org
In-reply-to: <[🔎] 20040510035717.GA13083@duncf.mine.nu>
References: <[🔎] Pine.LNX.4.58.0405091544530.5151@cantor.unex.es> <[🔎] 20040509162756.GB4038@espresso> <[🔎] c7ln5c$s41$1@sea.gmane.org> <[🔎] 20040510020933.GC3199@beowulf.thanes.org> <[🔎] 20040510035717.GA13083@duncf.mine.nu>

On Sun, May 09, 2004 at 11:57:17PM -0400, Duncan Findlay scribbled:
> On Mon, May 10, 2004 at 04:09:33AM +0200, Marek Habersack wrote:
> > On Sun, May 09, 2004 at 06:44:36PM +0200, Eike zyro Sauer scribbled:
> > > Andrew Lau schrieb:
> > > > Has debian.org's Spamassassin Bayesian database been poisoned? If so,
> > > > would flushing the database at random intervals be enough to keep its
> > > > usefulness feasible or would it just let too spam in after each flush?
> > > 
> > > I'd "donate" 6000 spam mails, if this helps.
> > I could add my 14845 spams, too :)
> 
> Pfff... you can have my 63,286 spams if you really want, but it won't
> really help you. The thing with a Bayesian database is that the mail
> it's trained on needs to be similar to the mail it will be tested
> against.
Most of my spam comes from the debian lists, so I would say it is similar
enough to the traffic down here.
 
> For what it's worth, empirical evidence indicates that SpamAssassin's
> Bayesian database is difficult to poison, since it's difficult for
> spammers to pick words that are learned as non-spammy (since everyone
> has their own set of non-spammy words). But, since lists.debian.org
> doesn't use bayes, this point is moot.
I don't understand why is SpamAssassin thought to be the only option? SA is
a CPU/memory hog, it can easily kill even a fairly powerful machine and
there _are_ alternatives to it. One thing to use could be dspam, as I
pointed at in the other post, another (which also uses language
classification and
is already packaged for debian) would be crm114 and then there is a whole
host of bayesian filter programs that are written in a language suited for
heavy-duty tasks (C, that is :>). Both dspam and crm114 boast over 99%
accuracy in spotting spam, now that would be really neat if we had that
level of protection around here.

regards,

marek

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: Spam in the lists out of control
  - From: Pascal Hakim <pasc@redellipse.net>
- Re: Spam in the lists out of control
  - From: Blars Blarson <blarson@blars.org>

References:
- Spam in the lists out of control
  - From: Santiago Vila <sanvila@unex.es>
- Re: Spam in the lists out of control
  - From: Andrew Lau <netsnipe@users.sourceforge.net>
- Re: Spam in the lists out of control
  - From: "Eike \"zyro\" Sauer" <eikes@cs.tu-berlin.de>
- Re: Spam in the lists out of control
  - From: Marek Habersack <grendel@debian.org>
- Re: Spam in the lists out of control
  - From: Duncan Findlay <duncf@debian.org>

Prev by Date: Re: Spam in the lists out of control
Next by Date: Re: Spam in the lists out of control
Previous by thread: Re: Spam in the lists out of control
Next by thread: Re: Spam in the lists out of control
Index(es):
- Date
- Thread