[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Request for help - cleaning spam from the debian-boot mailing list archive



Quoting Frans Pop (elendil@planet.nl):
> On Monday 01 June 2009, Christian Perrier wrote:
> > To be even more efficient, I wonder if there's a possibility to
> > download list archives as a mailbox. That would make spam tagging more
> > efficient than going through the web interface.
> 
> scp master.debian.org:~debian/lists/debian-boot/debian-boot.yyyymm.gz .
> 
> Only works for DDs obviously. Disadvantage is that this archive will still 
> have all spam that's already been removed...
> I'm sticking with the web interface myself.


Yesterday, I grabbed several such mailboxes.

Before working on them, I passed the messages through CRM114, which I
already use for a while to set scores on my incoming messages:

zcat debian-boot.200608.gz | formail -s /usr/bin/crm -u /home/bubulle/.crm114/ mailfilter.crm >> debian-boot.200608.scored


That creates a new "scored" mailbox where messages have additionnal
headers, including:

X-CRM114-Status: Good  ( pR: 161.9126 )
or
X-CRM114-Status: UNSURE (1.1278) This message is 'unsure'; please train it!
or
X-CRM114-Status: SPAM  ( pR: -15.1978 )


In my .muttrc, I have this:
color header white black ^X-CRM114-Status:.*Good.*
color header blue black ^X-CRM114-Status:.*SPAM.*
color header red black ^X-CRM114-Status:.*UNSURE.*

Then I read this mailbox with mutt.....

"unsure" messages appear in cyan and "sure" spams appear in red.

Then, I can "tag" messages ('T' in mutt's default keymapping) easily
by using the colors as a helper (of course I *do* check for false
positives) and also go through messages identified as "non
spam".....and tag those that are actually spam.

Then, all these tagged messages are piped to my "report list spam"
macro....and also identified as spam to CRM114 (pipe them to 
"$HOME/.crm114/mailfilter.crm -u $HOME/.crm114/ ss-pam --force"

Then, all "good" messages are identified as ham to CRM114.


As a conclusion, I found this method quite more efficient than using
the web interface....and, of course, it allows working offline, which
is a must-have for me.


Attachment: signature.asc
Description: Digital signature


Reply to: