RFS: qsf - small and fast Bayesian spam filter
-----BEGIN PGP SIGNED MESSAGE-----
I have made a package closing a RFP (#273937).
It's released under Artistic license. Below is the description telling
what it the program and the differences between other spam filters.
Quick Spam Filter (QSF) is an Open Source email classification filter,
designed to be small, fast, and accurate, which works to classify
incoming email as either spam or non-span. To recognise spam, QSF strips
the text out of the email (using MIME decoding and HTML stripping) and
then splits it into tokens (words, word pairs, URLs, and so on). These
tokens are then looked up in a database and analysed using the Bayesian
technique to see whether the email should be classified as spam or not.
The database is generated by a process of training - QSF is given two
mailboxes, one containing known spam, and the other containing known
non-spam, to train itself on. After training, if QSF misfiles any email,
the message it got wrong can be fed back into the database, thus making
QSF learn from its mistakes. For a more in-depth look at the way in
which QSF tokenises and classifies messages, please see the Technical
Details section of the manual. QSF is designed to be run by an MDA, such
QSF's targets are speed, accuracy, and simplicity. So:
* It is small and is written in C so it starts up quickly, unlike
filters written in Perl.
* It understands MIME and HTML, so it can intelligently deal with modern
spam, unlike older Bayesian filters such as ifile.
* It runs as an inline filter rather than as a daemon, so it is simple
* It is written to do only one job - decide whether an email is spam or
not using the content of the message alone - so it is less complex than
filters such as SpamAssassin. Less complexity means bugs and security
problems are less likely.
* As well as words and word pairs, QSF also spots special patterns in
email such as runs of gibberish, HTML comments embedded in text, and
other common spam giveaways, and its flexible tokeniser allows more
patterns to be added as spammers change their tactics.
Package and source code are available at
Package is linda and lintian clean. Builds OK on pbuilder.
Someone may say that there is no .orig file. That's because upstream
author gives me access to project CVS. He preferred that, instead of
removing the /debian dir from the source code. He has his reasons to
maintain /debian dir, even after I have talked to him and explained
about distributing /debian dir on source code.
Well, I think that is it. Error, suggestions, critics, are all welcome! :-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
-----END PGP SIGNATURE-----