[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: IMPORTANT: your message to html-tidy



on Tue, Sep 09, 2003 at 12:50:51PM +0200, Julian Mehnle (lists@mehnle.net) wrote:
> Karsten M. Self <kmself@ix.netcom.com> wrote:
> > [Using DNS RBLs to block spam is bad.]
> > As many people have noted, for pretty much _any_
> > given IP, your odds are good that most of the mail received from it is
> > spam.  It doesn't do much for the legit mail that comes through.  Given
> > that we now _do_ have good content/context based filters for assessing
> > spam likelihood for a given mail item, blind use of RBLs should be
> > discouraged.  It's the same sort of thinking that's causing no end of
> > trouble for people trying to communicate with AOL users:
> > 
> >     http://z.iwethey.org/forums/render/content/show?contentid=96264
> >     http://yro.slashdot.org/yro/03/04/13/2215207.shtml?tid=120

Please set your mailer/editor linewrap to 68-75 characters.  I strongly
recommend 72 as a good default.

Thank you.


Mostly amplifying Steve Lamb's comments, but:

> No, you can't make such a general statement that using content-based
> filters 

I didn't.  I said content / *context*.

C-R, RBLs, Vipul's Razor, and similar tools are all _single_ _factor_
analysis.  C-R and RBLs are broad -- they're assessing source or
putative sender, not content.  Razor is highly content-specific, but
requires someone else has seen the message and classified it first
(granted this is highly likely).

SpamAssassin is a multiple-measures analysis.

   - Scored content, keywords, and message format (e.g.:  HTML, word
     bolding, font color, etc.).

   - Specific network tests, including both RBLs and Razor, as well as
     DUL and other specific IP block lookups, if chosen.

   - Auto-whitelisting of sender.

   - Bayesian classifiers, as previously discussed here and elsewhere.

Each factor is weighted into the final score.  The overall result is
highly accurate as it takes into account both general (RBL, DUL) and
specific (auto whitelist, BC).  The scores themselves are adjusted over
time based on current spam and ham corpuses.  The weighting remains
appropriate over time.

> is "better" than using DNS RBLs.  It wholly depends on the listing
> policy of the RBL, and in most cases, content-based filters will be
> the far worse option, because it only drives spammers to make their
> spam stick out from the general mail noise less and less!  I.e.  after
> prolonged, widespread use of content-based filters, spam won't be
> easily distinguishable from your normal mail traffic anymore from a
> machine's point of view.

Bollux.

In addition to prior comments on Bayesian filtering methods, I'd like to
see how you propose for spammers to forge known, trusted, GPG
signatures, for example.


Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
    Defeat EU Software Patents!                         http://swpat.ffii.org/

Attachment: pgpVqSLpfEKyL.pgp
Description: PGP signature


Reply to: