Re: Attempts to poison bayesian systems

On Tue, 23 Dec 2003 13:25:30 +0000, Dale Amon wrote:

>I've been noticing loads of mails like this lately:
>  Date: Sun, 21 Dec 2003 16:25:34 +0500
>  From: "Joseph Jenkins" <qyzeji@canada.com>
>  Subject: Re: MIT, rest in peace!
>  To: admin@vnl.com
>  X-Mailer: mPOP Web-Mail 2.19
>  emery atrocious larval drippy elate incontrollable raster anglicanism
>  checkerberry feed sit ajar saturable decathlon
>  already climate inhibition pagoda narcissus expository toni

Yes, I'm seeing lots of these too.  
A particular pattern is that subject line format you quoted : 
  "Re:" followed by a short uppercase "word", followed by some 
  random lowercase nonsense (non-dictionary) words.  

What do you suppose _that's_ about ?
Can anyone think of a filter pattern to catch that ?

Also, almost without exception, there's a string of random dictionary
words in the body enclosed within <font color="white"></font> tags,
followed by lots more included *as* tags - thus : 
  L0se<angora> weight</fuming> rea</magnificent>lly quickly

Actually, all the examples I'm seeing are HTML format, so easily
filtered on that basis - are you seeing plain-text versions ?

>I can only assume someone out there is trying to attack
>bayesian systems by loading them up with all sorts of
>normal words so that good mail gets false positives, thus
>breaking the systems.

That sounds plausible :-(

