[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: About spam in the list archive



On Mon, 12 Nov 2007 00:24:19 +0100, Thomas Viehmann <tv@beamnet.de> said: 

> Hi, if people are interested to spam-removal happen, I am looking for
> help testing how this would work.  You need
> - a GPG key that is somewhat close to the Debian keyring,
> - to look at hundreds of messages that induced people to click on
>   "this is spam" button and sort them into spam, not spam, misguided
>   (e.g. unsubscribe requests and vacation messages), and unsure with a
>   bias towards not spam and unsure,
> - python2.5 installed (available in stable/testing/unstable).

> You get
> - a (probably buggy) console program to show you mails and ask for
>   your opinion,
> - a largish mbox file (for debian-project it is 6M uncompressed / 2M
>   compressed) with the messages,
> - a chance to do something about the spam in our web archive.

> Maybe we could start with debian-project at the present and go back in
> time from there until we get bored, but I'm open to suggestions.

        Hmm. I'll be happy to help automate some of the decision making
 using my Spam classification mechanisms; please look at 
   http://www.golden-gryphon.com/software/spam/crm114_accuracy.html
 to see the lower bound on accuracy I get from (mostly) Debian email.
 Adding SA to the CRM114  results above gives about 99.92% accuracy
 overall -- and crm114 has had 100% accuracy in identifying Spam in the
 last two years I have been using it.

        It would be interesting to see how many messages escape my
 filters, and give me an opportunity to further train them. All I need
 would be the mbox file; and for me to setup a process to feed the email
 to the filters, and classify the result -- and then send back the
 message ID's of Ham and Spam back to Debian.

        manoj
-- 
Beware of a dark-haired man with a loud tie.
Manoj Srivastava <srivasta@debian.org> <http://www.debian.org/~srivasta/>  
1024D/BF24424C print 4966 F272 D093 B493 410B  924B 21BA DABB BF24 424C



Reply to: