Proposed removal of spam from the debian-project mailing list web archives
Hi,
four people have checked the spam web form submissions concerning
debian-project. More background can be found at [1]. Thanks to Bas
Wijnen, Paul Wise, and Richard Hecker for reviewing! (Of course, a
special mention to Y Giridhar Appaji Nag who already looked through
debian-devel, but that isn't ripe for action yet.)
Proposal
--------
I propose to remove the 436 messages unanimously classified "spam" from
the web archive.[2]
Note, these will remain available to Devlopers on master.debian.org and
messages will be reincluded if complaints about an erroneous removal are
received by the Listmaster, as discussed at [1] (Policy corner stones).
Some statistics
---------------
Number of messages by range of classification responses (the four
possible responses are explained at[1]):
839 submissions reviewed
436 spam
225 not spam
6 inapp
1 unknown
68 unknown, spam
33 unknown, not spam
18 inapp, spam
9 unknown, inapp
3 not spam, inapp
17 unknown, spam, inapp
8 unknown, not spam, spam
5 not spam, spam
2 spam, not spam, inapp
4 unknown, not spam, inapp
4 unknown, inapp, not spam, spam
Analysis of the debian-project review
-------------------------------------
We should be most concerned about the messages with (detected) errors,
namely those where the answers contain both "spam" and "non-spam", so
below are the message-ids (best used in conjunction with[3]) and some
analysis of the nature of these messages.
While an error estimate would be nice to have, the naive approach is
based on an independence assumption that seems to be very wrong in our case.
I think that improved tools (quicker access to the web pages with the
"next in thread" links or using the web page, in particular), experience
for the corner cases, and triple review (including some experienced
spam-checker) is a good balance of reliability and effort. (I would even
claim that we there is nothing of particular value that received two
spam votes, but we want to be sure and loose as little as possible.)
hecker pabs tviehmann wijnen
--- one spam vote
not spam inapp unknown spam
courier.44194498.00006B55@softhome.net
a request to remove stuff from the archive
spam not spam not spam inapp
000a01c2f63a$55e0ff20$7827fea9@computer
a German user complaining about Debian CDs he bought elsewhere
spam unknown not spam unknown
MABBLMGCBFOBPPCNFDIIOENGCCAA.mario_capuano@katamail.com
an Italian user question
not spam unknown unknown spam
C03C7E6F.2EAF%jonas.hedlund@trigger.se
someone complaining about ICQ spam matching some list spam
spam unknown not spam inapp
1be.ebc747e.2ac4b275@aol.com
a German user looking for a translation program
spam not spam not spam not spam
20020821160713.GA8194@despayre.org
a complaint about IRC in response to an DWN article
spam unknown not spam inapp
21916A3354A3D511946800508BB9A9F5083D0991@svntexc2.gvt.net.br
a Portuguese user question
spam not spam not spam inapp
003d01c2ad91$64746cd0$0301a8c0@MATTHIAS
a German (Swiss) request to be sent a t-shirt to match the swirl
on his motor scooter
spam not spam not spam not spam
NHBBKODDALCBAECNDDJNMEDNCAAA.boufatit@sarpi-dz.com
a French and English user question
spam not spam unknown not spam
e6a527b20606150242i2dc527a1o97d144dc9563df9@mail.gmail.com
start of a troll thread
spam not spam unknown not spam
e6a527b20606151719w29f74ec5o88e3cd7914028855@mail.gmail.com
further down that troll thread
not spam not spam not spam spam
004401c3aa2b$00c6a580$6501a8c0@mrfish
an offer to redesign our web site, possibly serious
spam unknown not spam inapp
000801c2d8be$a61958a0$c13f243e@39y8vr2w2kpw8tg
a Spanish user question
not spam unknown unknown spam
050901c3f55f$e5a50000$0202a8c0@hotbox
a Linux portal announcement at least bordering spam
--- two spam votes
spam unknown not spam spam
1665599482.20060206122551@matic.com.pl
a Polish user question
spam unknown not spam spam
web-26275475@mail5.rambler.ru
someone looking (in a strange way) for someone with the the same
name as a Debian contributor who has some 256 posts on our
English language lists between 1999/09 and 2001/10
spam spam not spam unknown
E1AI3qU-0000YN-00@gluck.debian.org
a Spanish unsolicited software survey not directly related to
Debian
--- three spam votes
spam not spam spam spam
000801c2f43d$c5b2b180$0d00a8c0@laszlo
a Croatian (one-liner) user question
--- unquestionably spam
not spam spam spam spam
5.2.0.9.1.20030531203449.03c75de8@pop.videotron.ca
link request spam
Kind regards
Thomas
1. http://wiki.debian.org/Teams/ListMaster/ListArchiveSpam
and originally, with followups, on this mailing list
http://lists.debian.org/debian-project/2007/11/msg00012.html
2. In master.d.o:~tviehmann/spam-removals/ you will find
"reports" and "proposed" removals and the python (>=2.4) script
comparing them. The .spam files actually used reside with the
mbox archives on master:/org/lists.debian.org/lists/,
presently only four Listmaster-removed spams.
3. http://lists.debian.org/msgid-search/
use http://lists.debian.org/msgid-search/%s for quick bookmarks
--
Thomas Viehmann, http://thomas.viehmann.net/
Reply to: