Re: About spam in the list archive
On Mon, 12 Nov 2007 00:24:19 +0100, Thomas Viehmann <tv@beamnet.de> said:
> Hi, if people are interested to spam-removal happen, I am looking for
> help testing how this would work. You need
> - a GPG key that is somewhat close to the Debian keyring,
> - to look at hundreds of messages that induced people to click on
> "this is spam" button and sort them into spam, not spam, misguided
> (e.g. unsubscribe requests and vacation messages), and unsure with a
> bias towards not spam and unsure,
> - python2.5 installed (available in stable/testing/unstable).
> You get
> - a (probably buggy) console program to show you mails and ask for
> your opinion,
> - a largish mbox file (for debian-project it is 6M uncompressed / 2M
> compressed) with the messages,
> - a chance to do something about the spam in our web archive.
> Maybe we could start with debian-project at the present and go back in
> time from there until we get bored, but I'm open to suggestions.
Hmm. I'll be happy to help automate some of the decision making
using my Spam classification mechanisms; please look at
http://www.golden-gryphon.com/software/spam/crm114_accuracy.html
to see the lower bound on accuracy I get from (mostly) Debian email.
Adding SA to the CRM114 results above gives about 99.92% accuracy
overall -- and crm114 has had 100% accuracy in identifying Spam in the
last two years I have been using it.
It would be interesting to see how many messages escape my
filters, and give me an opportunity to further train them. All I need
would be the mbox file; and for me to setup a process to feed the email
to the filters, and classify the result -- and then send back the
message ID's of Ham and Spam back to Debian.
manoj
--
Beware of a dark-haired man with a loud tie.
Manoj Srivastava <srivasta@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C
Reply to: