[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: HTML mail



On Wed, Aug 13, 2003 at 05:56:04PM +0200, Richard Lyons wrote:
| On Wednesday 13 August 2003 7:46 pm, Jeff Elkins wrote:
| > Most of the spam I receive is HTML format. Is there a fairly painless way
| > of sending anything formatted HTML to my trash folder?

The Content-Type: header *ought* to provide this information.
However, when referring to spam, you can't really assume that any
rules will be followed.

| > I use kmail and sid.
| 
| I use 
| <body> contains "a href=" OR <body> contains "/form>" OR <body> contains 
| "/body>"
| 
| Or similar combinations of two or three of the commonest tags.  For some 
| reason filtering on "<html>" doesn't seem to work.

The reason is because a lot of spam is not well-formed HTML.  Despite
that, the spam is (apparently) effective because mail readers and
browsers tend to render bad html fairly reasonably.

| Also you need to use small fragments because the tags are often
| broken across lines and then not identified.

This is an attempt by the spammers to bypass simple-minded
text-matching blocks.  One solution to this is to parse the HTML the
way the mail reader would and match on the parsed version.

Better yet, use spamassassin or spambayes to identify just spam.
Unfortunately many people send non-spam mail in HTML or with both
text/plain and text/html parts.

-D

-- 
"Don't use C;  In my opinion,  C is a library programming language
 not an app programming language."  - Owen Taylor (GTK+ developer)
 
http://dman13.dyndns.org/~dman/

Attachment: pgpHn_fC5k0Ir.pgp
Description: PGP signature


Reply to: