bits from listmasters and [RFH] lists.d.o web archive


The listmaster team's main duty is to keep our mailing list software
functional. This includes constantly improving our spam filters,
creating and maintaining lists, supporting our subscribers and providing
our list archive. Since our last update in March 2008, quite a few
things have happened. This mail summarizes some of those things.

But first of all, the listmaster team is seeking help!

Request for help: web archives
The code generating the web archives needs a major overhaul. We
currently use mhonarc for generating the web archives. It is a set of 
scripts written mainly in perl which generate the site.

The problem the listmasters currently have with it is that it does not
properly distinguish between Perl code and templates, so you currently
find huge amounts of HTML sniplets inside the Perl code.

We would like to change that, but to do that properly we are currently a
bit understaffed. So we are looking for Perl programmers, who are
willing to help us in rewriting the whole thing using a templating
toolkit (e.g. Template::Toolkit or Text::Template) and Perl.

Care must be taken that the links generated by the new tool do not
change, and that a regeneration of the whole archive does not break any
existing links.

An ideal solution would also be usable by the BTS to replace the current
bugreport.cgi display to allow for intelligent threading and backends
beyond mailbox and maildir. (Perhaps utilizing Mail::Box or equivalent.)

List archive spam
Debian's spirit is to be as open as possible. The same is true for our
lists, so we only restrict posting access to announcement mailing lists.
This, on the other hand, gives spammers the ability to post to our
lists. While we try to have a properly working spam filter[1], we can't
catch every single spam.

Around March 2008 we started a huge round of spam removal in our web
archives. While this started a bit slowly in 2008 and only had support
via a direct web frontend, it got a huge push when Cord Beermann
documented [2] how the spam removal process works, Frans Pop thankfully
motivated [3] others to help with it and Sandro Tosi setup a wiki page
about MUA Plugins to report spam messages from Debian mailing lists
Thanks to every one who as contributed to this effort. As spam removal
in the web archives is an ongoing task, we appreciate all the help we
can get.  This also gives the listmasters the ability to properly train
our spam filters.

Bug categorization
All bugs (either wishlist or functional) for our list setup are
collected on the lists.d.o pseudo-package
<http://bugs.debian.org/lists.debian.org>. We recently did a major
reorganization using the usertag feature of the BTS. This new structure
enables both listmasters and newbies to more effectively work in
specific areas of the list setup. The currently used usertags and their
meaning are documented at

How to help
You can help us in a few important areas:

 * Spam rules -- If you notice spam getting through the spam filters,
   and have ideas for improving our filters, we accept patches to our
   rulesets, which are publicly available via svn [4]. This ruleset is 
   also shared with the bugs.d.o.

 * Avoid bouncing spam -- If you don't want your MTA to accept spam,
   please just discard it instead of 550'ing, at least when a message
   comes from liszt.debian.org

 * Troubleshooting -- If you notice a problem with a message that
   you've sent to a mailing list which hasn't arrived, please provide
   us with as much information as possible, including Date/Time (UTC),
   From, To, Message-Id, sending IP, and the log file entries from
   the sending host.

* Joining the team -- We are always open to additional help. A good
  example is our current 'Request for Help' mentioned earlier in this
  mail. If you think you can help us in any area of our current setup,
  please contact listmaster@lists.debian.org.  Check
  <http://wiki.debian.org/Teams/ListMaster> for more information about 
  the listmasters team.

On behalf of the Listmaster team,

Martin Zobel-Helas

[1] Our filters currently stop about 50000 incoming mails per day.
    2500 mails currently make through the filters.
[2] http://lists.debian.org/debian-devel-announce/2009/04/msg00012.html
[3] http://lists.debian.org/debian-boot/2009/05/msg00045.html
[4] svn://svn.debian.org/svn/pkg-listmaster/trunk/spamassassin_config
