[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#265870: RFP: dspam -- highly scalable, large scale, multi-statistic spam analyzer for MDAs



Package: wnpp
Severity: wishlist

* Package name    : dspam
  Version         : 3.1.0
  Upstream Author : Jonathan A. Zdziarski <jonathan@nuclearelephant.com>
* URL             : http://www.nuclearelephant.com/projects/dspam/
* License         : GPL
  Description     : highly scalable, large scale, multi-statistic spam analyzer and filter

System-wide administratively-maintenance free filtering. The DSPAM
agent masquerades as the email server's delivery agent (or proxy agent
if necessary) providing filtering at the server level.

A simple-to-use learning mechanism. DSPAM allows users to simply
forward their spam to their "spam email address" for learning,
eliminating any learning curve necessary to make it usable by your
customers. The information used in every calculation is temporarily
stored on the server, enabling DSPAM to relearn the original message
by looking for a small signature in the forwarded spam. As a result,
users don't have to be trained to 'bounce' messages around, and
administrators don't have to worry about incompatible mail clients.

Support for a variety of storage implementations. DSPAM's storage
driver API allows the administrator to choose how they wish to store
data. Currently supported drivers include SQLite, Berkeley DB3,
Berkeley DB4, MySQL, PostgrSQL and Oracle.

Multi-Algorithm Support. DSPAM presently supports the following
combination algorithms: Graham-Bayesian, Burton-Bayesian, Robinson's
Geometric Mean, and Fisher-Robinson's Chi-Square. The administrator
may choose one or more of these algorithms to use when calculating
against spam and even combine two or more for extended filtering
reach.

A strong focus on large-scale implementation support. The largest
implementation of DSPAM heard involves 125,000 users, with the next
largest being around 100,000, then 70,000. DSPAM has been designed to
run with a very short execution time (between 0.01s - 0.03s real time
for classification and between 0.03s - 0.10s real time for training,
on average hardware), and has been equipped with a storage driver API
allowing several different storage mechanisms to be used. Depending on
disk space constraints, accuracy can be traded off for additional disk
space or vice-versa

-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.4.26.20040601
Locale: LANG=C, LC_CTYPE=C (ignored: LC_ALL set to en_US)



Reply to: