Re: Attempts to poison bayesian systems
> One technique that's being used a lot is to get books in electronic form
> put a coupld of sentences in every spam (sentences from a book will pass
> gramatical checking etc, unlike the example you posted above). Also
> from a book will have the right ratio of words, you will almost never
> such a long "sentence" in an email message without a punctuation
> "and", "or", or other common words except in the case of source code
> is another category in bayesian filters).
That won't work very well with Spamassassin, as it doesn't rely on
bayesian filtering alone, and also uses header check and dnsbl checks. So
you are correct... it does lower the bayesian score with these "random
legitimate" sentences, but doesn't get them through completely unless you
are using something like popfilter or such that only have bayesian
filtering. And also note they can't only have these sentences in their
emails... they must still have the "catch line" like "increase pen1s size"
or something like that, and the bayesian filter will, over time, learn
that all the other words are not as important as "pen1s" and these other
words. So eventually it will work... at least that's my understanding of
it. Feel free to improve or correct the above.