[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Spam cleaning in list archives: one last effort needed



We're now quite close to say "we went through the entire archive of
debian-boot and cleaned it from spam".

Indeed, when looking at the coordination page, one cas see that
between November 1998 and December 2009, nearly all months have been
reviewed by 5 people at least, which is the minimum to guarantee that
spam is identified (a given message has to be reported 5 times to be
tagged as potential spam and proposed for review).

The remaining months, that should be targeted by anybody but 
me, Frans Pop, Holger Wansing and Lee Winter, are:
* March 1999 to May 2000
* October 2000 to July 2002
* May and June 2004

So, as one can see, only "old" archives still potentially have spam.

I suggest anybody wanting to complete this to focus on the two 2004
months first. Be aware that the traffic was huge at that time (May
2004 has 4126 messages!). 2001-2002 have about 1000-1500 mails a
month, most of them being commit mails from the CVS. 1999-2000 have a
few hundreds, again most of them being commits.

If you're interested in statistics, you can look [2] to learn that
"our" list is by far the one that got most cleaning. As of now we
identified and removed 4550 posts. Only 176 posts initially reported
as spam were finally identified as ham and of course kept.


{1] http://wiki.debian.org/DebianInstaller/SpamClean
[2] http://lists.debian.org/archive-spam-removals/review/stats.html


-- 


Attachment: signature.asc
Description: Digital signature


Reply to: