RFS: qsf - small and quick Bayesian spam filter
-----BEGIN PGP SIGNED MESSAGE-----
A time ago (before Sarge release) I have sent a RFS to the list. But
since Sarge was going to be released, I think that everyone was busy.
It closes a RFP (#273937).
Quick Spam Filter (QSF) is an Open Source email classification filter,
designed to be small, fast, and accurate, which works to classify
incoming email as either spam or non-span. To recognise spam, QSF strips
the text out of the email (using MIME decoding and HTML stripping) and
then splits it into tokens (words, word pairs, URLs, and so on). These
tokens are then looked up in a database and analysed using the Bayesian
technique to see whether the email should be classified as spam or not.
The database is generated by a process of training - QSF is given two
mailboxes, one containing known spam, and the other containing known
non-spam, to train itself on. After training, if QSF misfiles any email,
the message it got wrong can be fed back into the database, thus making
QSF learn from its mistakes. For a more in-depth look at the way in
which QSF tokenises and classifies messages, please see the Technical
Details section of the manual. QSF is designed to be run by an MDA, such
QSF's targets are speed, accuracy, and simplicity. So:
* It is small and is written in C so it starts up quickly, unlike
filters written in Perl.
* It understands MIME and HTML, so it can intelligently deal with modern
spam, unlike older Bayesian filters such as ifile.
* It runs as an inline filter rather than as a daemon, so it is simple
* It is written to do only one job - decide whether an email is spam or
not using the content of the message alone - so it is less complex than
filters such as SpamAssassin. Less complexity means bugs and security
problems are less likely.
* As well as words and word pairs, QSF also spots special patterns in
email such as runs of gibberish, HTML comments embedded in text, and
other common spam giveaways, and its flexible tokeniser allows more
patterns to be added as spammers change their tactics.
It's released under Artistic license, linda and lintian clean and builds
OK on pbuilder.
Package, source and other files available at
Upstream author gaves me access to the project CVS, so the diff file is
small. If I change something on the package, I will updated CVS, and
Thank you very much!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
-----END PGP SIGNATURE-----