[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

How to filter out mailing list spam with bogofilter



I'm a bit disappointed that the spam problem is fixed, as I was using it as an 
opportunity to try and get bogofilter, which I use with Kmail to filter out 
the mailing list spam.

The suggestion I got from the bogofilter mailing list is to set up an 
ignorelist.db, in the same directory as the wordlist.db, and the FAQ on the 
site gives instructions how to do this as below.

<FAQ>
Can I tell bogofilter to ignore certain tokens?

Through the use of an ignore list, bogofilter will ignore the listed tokens 
when scoring the message.

Example:
 
    wordlist I,ignore,~/ignorelist.db,7
    wordlist R,system,/var/spool/bogofilter/wordlist.db,8
 

Because ignorelist.db has a lower index (7), than wordlist.db (8), bogofilter 
will stop looking when finds a token in ignorelist.db.

Note: Technically, bogofilter gives a score of ROBX to the tokens and expects 
the min_dev parameter to drop them from the scoring.

< This is where I'm confused>

There are two main methods for building/maintaining an ignore list.

First, a text file can be created and maintained using any text editor. 
Bogoutil can convert the text file to database format, e.g. "bogoutil -l 
ignorelist.db < ignorelist.txt".

Alternatively, echo ... | bogoutil ... can be used to add a single token, for 
example "ignore.me", as in:
 
  echo ignore.me | bogoutil -l ~/ignorelist.db
<end of FAQ>

Is anybody on the list using an ignorelist.db with bogofilter, and if so, how 
have they set it up?

As far as I understand, and because for example the Debian list is usually 
free of spam messages, bogofilter has them marked as ham. The suggestion from 
the bogofilter list was to populate the ignorelist.db with headers from 
genuine non-spam posts from the Debian list, and in that way, when 
downloading email, bogofilter would ignore the genuine messages from the 
list, and headers that are spammy on messages from the Debian list would be 
processed by bogofilter, and hopefully be classified as spam, and end up in 
the wastebin, rather than my Debian-user mailbox.

What I'm unsure about is how to get genuine headers from the Debian mailing 
list posts into the ignorelist.db, which is why I asked if anyone is doing 
this earlier on in this very long diatribe.

This is all a bit academic at the moment, as the spam flood has stopped on the 
list. That is unless someone can direct me to a mailing list that I can 
temporarily subscribe to, that is known for allowing spam. If so I can at 
least see if bogofilters ignorelist.db is working.

Apart from filtering out mailing list spam bogofilter is working just fine, 
and I only get the odd spam, or ham in the unsure mailbox. The rest of the 
spam goes straight to the wastebin.

Any help appreciated from those that are using bogofilter, and dealing with 
mailing list spam.

Nigel.




Reply to: