[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The "clean out spam from archives" effort is lagging



Quoting Lee Winter (lee.j.i.winter@gmail.com):

> I did the most recent three months of 2009, but the density was pretty low.

I haven't checked the wiki and  I'm not online right now, but please
take care to register this in the page.

> 
> > Old archives are also missing reviews, particularly a few from 2005
> > and nearly all from 2004, not to mention older archives.
> 
> So I started at the beginning (part of 1998) and went to the end of
> 2002.  If I have time this week I will look at 2003-2005.

Ditto.

> > Please take some time to do this work. This is not that time
> > consuming: one month can be reviewed in about 10-15 minutes....even
> > less when you're used to methods for spotting spams.
> 
> The work is pretty tedious and reviewing non-spam emails five time is
> extremely inefficient.  Consider a solution that would allow one
> person to scan the archive to generate a list of spam targets.  If the
> other four reviewers only had to review the listed spam candidates
> they would not have to waste their time reviewing non-spam.

I'm sure the listmasters would welcome such improvements but, well, we
already have a very good tool.

Also, restricting the list to what the first person has identified
would increase the risk of missing some spams.

When I worked on the entire archive, I finally dropped the web
interface and used an alternative method:

- download the list archives as mailboxes
- pass them through my CRM114 spam filter
- open them in my MUA (mutt)
- tag spam messages (being processed by CRM114, most spams are already
identified by CRM114 markers)
- bounce them to the spam report mail addresse
(report-listspam@lists.debian.org) with the following key macro:

macro index \eL "breport-listspam@lists.debian.org\no\nq" "report as spam to Debian lists"

I found this much more efficient.

Downloading list archives as mailboxes is only accessible to Debian
developers but I can provide them to people who might need them.


Attachment: signature.asc
Description: Digital signature


Reply to: