[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: IMPORTANT: your message to html-tidy



Julian Mehnle dijo [Tue, Sep 09, 2003 at 12:50:51PM +0200]:
> No, you can't make such a general statement that using content-based
> filters is "better" than using DNS RBLs.  It wholly depends on the
> listing policy of the RBL, and in most cases, content-based filters will
> be the far worse option, because it only drives spammers to make their
> spam stick out from the general mail noise less and less!  I.e. after
> prolonged, widespread use of content-based filters, spam won't be easily
> distinguishable from your normal mail traffic anymore from a machine's
> point of view. 

I suggest you to take a look at Paul Graham's writings on filtering
spam, some of which are:

http://www.paulgraham.com/spam.html
http://www.paulgraham.com/better.html

As Bayesian filters learn the patterns considered spam, they get better
and better classified. It is very hard for spammers to forge
genuine-looking mails - and even there, it will be quite easy to catch
them as spam. Yes, no automated system is perfect, and this message
might get a little bit higher ranked than it should - But the "bad
words" are not appearing near the top of the message. It is very hard
for a legitimate mail to be about porn, pr0n, viagra, v1agra, debt
reduction, mortgage renegotiation and such. It is very hard for a spam
not to try to be snappy and call your attention - Spam will almost
always be text/html, not text/plain. Spam will VERY LIKELY BE ALL CAPS,
FULL WITH EXCLAMATIONS!!!!

In short: If a spammer resorts to writing genuine-looking email, it will
be a less effective publicity, as it will catch fewer eyeballs. Few
articles dominate 80% of the spamming scene, and we can almost-safely
mark them all as spam.

Greetings,

-- 
Gunnar Wolf - gwolf@gwolf.cx - (+52-55)5630-9700 ext. 1366
PGP key 1024D/8BB527AF 2001-10-23
Fingerprint: 0C79 D2D1 2C4E 9CE4 5973  F800 D80E F35A 8BB5 27AF



Reply to: