[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Proposed solution to spam problem on Debian mailing lists (and bts)



The spam discussion raises it's head again, so I guess it is time for
me to write down my ideas for spam filtering on debian mailing lists
(which should work fine for the bug tracking system as well).  There
should be a debian spam-fighting mailing list to discuss this on,
rather than have the discussion occur on many different lists
repeatedly.

Most spam is fairly easy to detect, (using spamassassin, DCC,
bogofilter, various DNSBLs, etc.)  but if you are unwilling to allow
any false positives you wind up with a bunch of false negatives, and
visa-versa.  Humans can make such decisions fairly easily, but the
volume of mail involved makes it impractical for any one person or
small group to do it on a long-term basis.

There seem to be quite a few people who want to help the debian
project, but for whatever reason aren't debian developers.
(Insufficient time, don't want long-term commitment, non-programmers,
not approved yet, or whatever) There may also be debian developers
with odd spare moments to do a task that requires little thinking.
Many of them have reasonable internet connections.

What I propose is a sort of moderation by self-appointed committee.
Software would pass any obvious non-spam along (signed by debian
developer or low spamassassin score) then hold anything else for
review.  Reviewers would log in to a web page, and messages would be
displayed with choices of spam, non-spam, inappropriate for list, and
undecided (give to someone else).  Messages approved enough
randomly-selected reviewers would be approved.  Messages not
sufficiently reviewed for a period of time (12 hours?)  would be passed
on to the list.  Rejected messages would be sent back with an
explanation.

Anyone can become a reviewer, with as many different reviewer IDs as
desired.  Reviewers can specify what type of messages (which mailing
lists, languages, message size) they want.  Each reviewer ID builds a
trust level as more messages are marked correctly.  (Perhaps reviewing
could count some to becoming a debian developer.)  Reviewers can
review as many or as few massages as they desire, whenever they desire.

An appeals process needs to exist for incorrectly marked messages.
(Both letting spam through and marking non-spam as spam.)  This needs
to be a trusted position, but should not involve a high volume of work
with enough trusted reviewers.  Reviewers getting overridden loose some
trust.

People complaining about too much spam on their mailing list could be
told to help review.

If accepted, this whole process will need some fine tuning.

I am willing to write or help write the software needed if it looks
like there is a fair chance (say 25%) this will be adopted.

This also builds a corpus of human-reviewed spam and non-spam, useful
for helping tune spam fighting tools.

-- 
Blars Blarson			blarson@blars.org
				http://www.blars.org/blars.html
"Text is a way we cheat time." -- Patrick Nielsen Hayden



Reply to: