How to filter out mailing list spam with bogofilter
I'm a bit disappointed that the spam problem is fixed, as I was using it as an
opportunity to try and get bogofilter, which I use with Kmail to filter out
the mailing list spam.
The suggestion I got from the bogofilter mailing list is to set up an
ignorelist.db, in the same directory as the wordlist.db, and the FAQ on the
site gives instructions how to do this as below.
Can I tell bogofilter to ignore certain tokens?
Through the use of an ignore list, bogofilter will ignore the listed tokens
when scoring the message.
Because ignorelist.db has a lower index (7), than wordlist.db (8), bogofilter
will stop looking when finds a token in ignorelist.db.
Note: Technically, bogofilter gives a score of ROBX to the tokens and expects
the min_dev parameter to drop them from the scoring.
< This is where I'm confused>
There are two main methods for building/maintaining an ignore list.
First, a text file can be created and maintained using any text editor.
Bogoutil can convert the text file to database format, e.g. "bogoutil -l
ignorelist.db < ignorelist.txt".
Alternatively, echo ... | bogoutil ... can be used to add a single token, for
example "ignore.me", as in:
echo ignore.me | bogoutil -l ~/ignorelist.db
<end of FAQ>
Is anybody on the list using an ignorelist.db with bogofilter, and if so, how
have they set it up?
As far as I understand, and because for example the Debian list is usually
free of spam messages, bogofilter has them marked as ham. The suggestion from
the bogofilter list was to populate the ignorelist.db with headers from
genuine non-spam posts from the Debian list, and in that way, when
downloading email, bogofilter would ignore the genuine messages from the
list, and headers that are spammy on messages from the Debian list would be
processed by bogofilter, and hopefully be classified as spam, and end up in
the wastebin, rather than my Debian-user mailbox.
What I'm unsure about is how to get genuine headers from the Debian mailing
list posts into the ignorelist.db, which is why I asked if anyone is doing
this earlier on in this very long diatribe.
This is all a bit academic at the moment, as the spam flood has stopped on the
list. That is unless someone can direct me to a mailing list that I can
temporarily subscribe to, that is known for allowing spam. If so I can at
least see if bogofilters ignorelist.db is working.
Apart from filtering out mailing list spam bogofilter is working just fine,
and I only get the odd spam, or ham in the unsure mailbox. The rest of the
spam goes straight to the wastebin.
Any help appreciated from those that are using bogofilter, and dealing with
mailing list spam.