[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Looking for a sponsor for bmf



On Sun, Oct 06, 2002 at 10:59:08AM +0200, martin f krafft wrote:
> also sprach Tom Marshall <tommy@home.tig-grr.com> [2002.10.06.0608 +0200]:
> > I've written a Bayesian e-mail filter,
> > http://sourceforge.net/projects/bmf/ From the project page:
> > 
> >   bmf is a self contained and extremely efficient Bayesian mail
> >   filter. See Paul Graham's article "A Plan for Spam" for background
> >   information. It aims to be faster, smaller, and more versatile
> >   than similar applications.
> 
> bogofilter already implements this.
> 
> > I would like to become a Debian developer to maintain my own package
> > (and a few others) and find a sponsor.  Toward that end, I am also
> > in the process of contacting a current Debian developer to sign my
> > GPG key but he has not yet responded.  If anyone reading this
> > message is in or near Seattle, WA and would like to sign my key,
> > please let me know.
> 
> I'll sponsor it if you can tell me why it's better than bogofilter.
> Where are the Debian sources?

- It is small.  bogofilter is over 30 times its size.  spamprobe is over 7
  times its size.

- It is versatile.  It supports text files (compatible with bogofilter 0.6),
  libdb v1 to v4 (compatible with bogofilter 0.7.x), and mysql.  bogofilter
  and spamprobe support only recent versions of libdb.

- It is efficient.  The incoming text is not copied to form a data
  structure.  Sorted vectors are used to store data.  The number of calls to
  memory allocation functions is orders of magnitude less than bogofilter.

- It supports scoring and updating the word lists in a single invocation.
  This was finally added less than two days ago in bogofilter cvs.

- It includes a utility to convert between supported formats.  bogofilter
  and spamprobe have no facility to import/export lists.

- The parser is handcrafted and is easily made to recognize the unique
  format of an email message (multiline headers, case insensitivity in
  header names, etc.)  This would be difficult in a lex grammar.

- It does not rely on external data structure libraries.  The current
  release of bogofilter uses libJudy, which is a pain to download and
  compile.  Yes, I realize that CVS and the current .deb have removed
  libJudy.

- It is highly portable.  It's written in C and compiles cleanly with no
  compiler warnings on several architectures.

- Its author promises not to break backward compatibility without providing
  a clean upgrade path.

My initial shot at a Debian package is available on sourceforge.net along
with the tarball and rpm package.  I freely admit that the current version
is hacked up from looking at existing packages.  I'm here because I want to
learn the Right Way to do it.

-- 
Majority, n.:
        That quality that distinguishes a crime from a law.

Attachment: pgp_ZvUjUco9z.pgp
Description: PGP signature


Reply to: