Re: baysian filtering (was: Re: Massive increase of spam on debian-*@l.d.o)

To: debian-user@lists.debian.org
Subject: Re: baysian filtering (was: Re: Massive increase of spam on debian-*@l.d.o)
From: Richard Kimber <rkimber@ntlworld.com>
Date: Wed, 5 May 2004 14:48:10 +0100
Message-id: <[🔎] 20040505144810.740b95ea.rkimber@ntlworld.com>
In-reply-to: <[🔎] slrnc9hq2j.hg7.spam@home.bounceswoosh.org>
References: <[🔎] 20040505064920.GA20958@comcast.net> <[🔎] 20040505065431.GB20958@comcast.net> <[🔎] slrnc9hq2j.hg7.spam@home.bounceswoosh.org>

On Wed, 5 May 2004 13:12:51 -0000
"Monique Y. Mudama" <spam@bounceswoosh.org> wrote:

> Anyway, I dutifully pipe them through sa-learn, but I worry.  If these
> spams look so much like regular mail, won't I just end up tainting my
> baysian library by teaching sa-learn with them?  I mean, eventually,
> won't my baysian scheme be unable to distinguish between spam and ham?
> 
> Thoughts?

If it looks at the headers as well as the body, as Bogofilter does, that
should help it to distinguish.  Also what you define as ham is surely
more than just well-formed grammar etc. Your corpus of ham messages
surely contains either a different collection of words or words with
different frequency of occurence than spam messages, and if you train it
right, a good bayesian system should be able to see the difference. I
should have thought you would only have problems if your ham normally
contains a lot of long-winded jokes similar to the spam, and the spam
comes from sources that your ham normally comes from.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/

Reply to:

References:
- Massive increase of spam on debian-*@l.d.o
  - From: William Ballard <40414.nospam@comcast.net>
- Re: Massive increase of spam on debian-*@l.d.o
  - From: William Ballard <40414.nospam@comcast.net>
- baysian filtering (was: Re: Massive increase of spam on debian-*@l.d.o)
  - From: "Monique Y. Mudama" <spam@bounceswoosh.org>

Prev by Date: Re: baysian filtering
Next by Date: External CRT.
Previous by thread: Re: baysian filtering
Next by thread: Re: Massive increase of spam on debian-*@l.d.o
Index(es):
- Date
- Thread