Re: Attempts to poison bayesian systems
On Tue, 23 Dec 2003 13:25:30 +0000, Dale Amon wrote:
>I've been noticing loads of mails like this lately:
> Date: Sun, 21 Dec 2003 16:25:34 +0500
> From: "Joseph Jenkins" <email@example.com>
> Subject: Re: MIT, rest in peace!
> To: firstname.lastname@example.org
> X-Mailer: mPOP Web-Mail 2.19
> emery atrocious larval drippy elate incontrollable raster anglicanism
> checkerberry feed sit ajar saturable decathlon
> already climate inhibition pagoda narcissus expository toni
Yes, I'm seeing lots of these too.
A particular pattern is that subject line format you quoted :
"Re:" followed by a short uppercase "word", followed by some
random lowercase nonsense (non-dictionary) words.
What do you suppose _that's_ about ?
Can anyone think of a filter pattern to catch that ?
Also, almost without exception, there's a string of random dictionary
words in the body enclosed within <font color="white"></font> tags,
followed by lots more included *as* tags - thus :
L0se<angora> weight</fuming> rea</magnificent>lly quickly
Actually, all the examples I'm seeing are HTML format, so easily
filtered on that basis - are you seeing plain-text versions ?
>I can only assume someone out there is trying to attack
>bayesian systems by loading them up with all sorts of
>normal words so that good mail gets false positives, thus
>breaking the systems.
That sounds plausible :-(
Merry Happy Season Of Jollyness everyone
The 2003 Perl Advent Calendar: http://perladvent.org/2003/