Maia Roadmap
- To: debian-nonprofit@lists.debian.org
- Subject: Maia Roadmap
- From: Robert Guerra <rguerra@lists.privaterra.org>
- Date: Sat, 4 Sep 2004 12:12:38 -0400
- Message-id: <p06200a14bd5f992ed9cc@[192.168.3.13]>
- In-reply-to: <5852868B65F75549A932442514644E100221470E@cubert.internal.mott.org>
- References: <5852868B65F75549A932442514644E100221470E@cubert.internal.mott.org>
I'm forwarding the rccent message posted to the maia-mailguard
mailing list that might be of interest to the linux users and
developpers on this list.
If you aren't aware of Maia-mailguard - you should be, as it's one of
the best anti-spam solutions out there...
regards
Robert
Date: Sat, 04 Sep 2004 06:55:49 -0700
From: Robert LeBlanc <rjl@renaissoft.com>
Organization: Renaissoft, Inc.
It's been a while since I've sat down to put my thoughts on paper
about where Maia is headed, so I figured I'd take a little time now
to do just that.
In the near-term, of course, the goal is to get a "final" 1.0 version
of Maia out there, including support for Smarty templates and JpGraph
charts. This is coming along nicely, and we should be on track for
that release before the end of this month.
What I really want to talk about here, though, are the longer-term
design goals for Maia. I'll define these as "mid-term" (1.x) and
"long-term" (2.x).
Mid-Term Features:
(1) Automated site-specific DNSBL
With the addition of the rbldnsd package, some additional Perl
scripts and code to extract the connecting peer's IP address from
mail headers, it will be possible to maintain a database table of
offending IP addresses. The database would store information about
the earliest connection attempt from a given IP address, the most
recent connection attempt, and the total number of connection
attempts to date. From this data, a Perl script can create a DNSBL
zone file at regular intervals for rbldnsd to use locally. This
local DNSBL can then be used either by your upstream MTA for
blocking, or by SpamAssassin for scoring purposes.
Extra scripts would then allow you to do things like auto-expire
entries from this database according to a configurable set of
criteria--e.g. how long it's been since the last connection attempt
from that IP address, how many pieces of confirmed spam/viruses have
been received from that address, etc. A first-time offender, for
instance, might only stay on the list for 24 hours, whereas a second
offense might extend its stay on the list for 3 days, a third offense
would extend it for 7 days, and so on, in an exponential progression
of some sort.
Additionally, that local DNSBL could be used to do more extensive
kinds of blocking, based on the "severity level". Perhaps a site
that has only been listed 1-10 times should only have mail service
denied, but at 11+ offenses you might want to auto-add such addresses
to a packet-level filter (e.g. IPTables, IPChains, etc.) to block
other protocols as well. With the information conveniently
databased, scripts can access that data and do whatever you want with
it.
Obviously this would also require some sort of IP whitelisting
facility as well, to ensure that certain "known-good" IP addresses
are never listed under any circumstances.
(2) SpamCannibal integration
Following closely on the first feature (1), it's not a big step at
all to (optionally) integrate Maia with the SpamCannibal tarpit
network <http://www.spamcannibal.org/>. SpamCannibal uses a similar
database-driven list of IP addresses, but rather than outright
blocking of mail items it tarpits connections from listed addresses
instead (a.k.a. throttling, teergrubing, greylisting, etc.). The
basic idea is that the connecting peer is slowed to a crawl by having
to send its mail in tiny pieces with lengthy delays in between, and
no packet acknowledgements. This tactic has proven to be quite
effective against DDoS attacks, and coincidentally works quite well
against spam and worm-spew from zombies (a.k.a. open proxies).
The SpamCannibal solution itself is just one implementation of this
kind of mechanism, and may not be the one that Maia ultimately mates
with, simply because it relies on the Linux IPTables facility to do
the work. This is an elegant solution, since it's MTA-independent,
but it's also not particularly portable. I'm open to suggestions for
more portable solutions, of course.
One of the key advantages to tarpitting is that it ties up the
spammer's resources for hours, days, or even months. This prevents
the connecting peer from making repeated attempts (it still thinks
the first attempt is succeeding, albeit slowly), and limits the
resources it has available with which to spam other people.
SpamCannibal maintains a network of shared data for its DNSBL,
however, with SpamCannibal sites submitting new records to this
shared database and downloading current versions of the database at
regular intervals. I had plans of my own of that sort, with the Maia
Network (see (3) below), but there's no reason Maia sites can't share
data with the SpamCannibal network as well. This would provide yet
another collaborative spam-reporting option for sites running Maia.
(3) The Maia Network
One of the originally-stated design goals for Maia was to leverage
the data being gathered by other Maia users in such a way that this
data could help others in their war on spam and malware. As we've
seen from services like Razor, Pyzor, DCC, and DNSBLs, collaborative
networks are powerful things. Each of these is slightly different
from the others, either in terms of how the data is gathered or what
type of data is gathered. With the Maia Network I propose to add
something new in both respects.
First of all, the Maia Network will serve as a source for aggregate
statistics, as gathered by all the participating Maia sites around
the world (participation will be optional, of course). From a
data-mining perspective, this allows us to chart global and regional
trends, and get a "big picture" view of what's going on out there.
Second, by sharing information from local Maia DNSBLs, a "master
database" can be compiled and used to form a proper Maia DNSBL, which
can then be consulted by participating Maia sites and used for MTA
blocking or SpamAssassin scoring.
Third, and perhaps most ambitiously, by sharing tokenized data from
Bayes databases, we have the ability to assemble a "master Bayes
database" that serves as an ever-growing corpus of spam and ham, but
without privacy concerns. Since the data is tokenized before being
uploaded to the Maia Network, no actual e-mail is sent, just the
handful of tokens (words) that registered highly as ham or spam in
your Bayes database. Think of this as a site-wide Bayes taken to an
even higher level--a "global Bayes". This is effectively what the
SpamAssassin folks use to do their scoring analyses (though in their
case they use the actual e-mails and run them through a "mass-check"
script to extract the tokens). By just uploading the tokens, we save
bandwidth and eliminate a serious privacy concern, while ending up
with the same data. This would let us do interesting things, such as
auto-generate balanced scores for SpamAssassin rules, based on the
spam and ham received by participating Maia sites. Those sites could
then download new balanced score sets at regular intervals, without
having to wait months for the next official SpamAssassin release.
Fourth, the genetic algorithms used to do the score-balancing
described above can be distributed across participating Maia sites,
in order to get the processing done in less time. The SpamAssassin
folks currently take up to four *weeks* to balance scores, due to the
processing-intensive nature of the algorithms involved. That time
could be reduced considerably by parallelizing the operations and
distributing them. Having a supercomputing cluster handy for this
task would be nice, but it's hardly necessary, really--a few dozen
machines around the world that volunteer to contribute some spare
cycles could do the job just as effectively. The more machines we
have participating, obviously, the more frequently we'll be able to
issue new balanced score sets.
(4) Enhanced reporting options
If you've been following this list for any length of time, you'll
realize that I'm generally not content with solutions that merely
shield users from spam and malware. A lot of anti-spam and
anti-virus packages do wonderful things to make the problem seem to
"go away" for the people they protect, but do very little to attack
the problem itself. (A classic example of this kind of thinking is
the so-called "challenge/response" system, which shields users from
spam and worm-spew but at the cost of making the problem worse for
everyone else.)
This is why I'm fond of collaborative reporting networks like Razor,
DCC, Pyzor, and SpamCannibal--by sharing data with these networks,
others out there can benefit as well. Taking things a step further,
though, it seems to me that we can be doing a lot more with the
"evidence" that we're gathering in our quarantines and our spamtraps.
We've gone to the trouble of collecting and classifying the mail,
after all, so why shouldn't we get as much mileage as we can out of
it before we throw it away?
SpamCop comes to mind as one good place to file such reports,
although reporting in bulk to SpamCop isn't free, and their free
reporting service was severely rate-limited, last time I checked.
Having said that, why couldn't Maia do the same kind of job itself?
The bulk of the work is header analysis, trying to identify forged
headers and the point of injection of the e-mail into the mail
system, in order to determine who the appropriate abuse contact(s)
are. The rest of the SpamCop system consists of a complaint-tracking
mechanism with tie-ins to a DNSBL for offenders who choose to ignore
the complaints. Not difficult to implement with clickable URLs with
hashed tokens that uniquely identify a complaint record in the
database, and a web form to let the abuse contact respond to the
complaint (just as SpamCop does).
Long-Term Features:
In the longer term (i.e. Maia 2.x), I have it in mind to redesign
Maia from the ground up as a tightly-integrated spam, malware, and
content-filtering system.
(1) Replace amavisd-new with maiad
While the 1.x series is based on amavisd-new (and patches amavisd-new
quite heavily), I expect that by 2.x I will have replaced amavisd-new
with something purpose-built for Maia's needs (e.g. "maiad"). To be
clear, this is not because amavisd-new is at all a poor product, it's
simply that its design goals are somewhat different than Maia's, and
consequently I've had to work with (or address) some of its
limitations to get it to do what Maia needs. Eventually the patching
effort becomes too cumbersome to maintain, and it makes more sense to
start from scratch.
(2) Plug-in architecture
The eventual maiad will be designed with a plug-in architecture in
mind, so that "filter modules" can be written by third parties and
combined by the end user as desired, much the same as Apache supports
modules for additional functionality. The principle is simply that
maiad is a mail processor--it receives mail from an upstream MTA,
calls one or more filter modules, then passes the filtered result to
a downstream MTA. The selection of filter modules, the order in which
they're applied, and the logic that dictates the flow of mail through
these filters (i.e. the finite-state machine) should all be
configurable. Special "dummy filter" modules can add non-filter
functionality as necessary.
(3) Headless API
Maia 2.x will also be essentially "headless"--that is, it will not be
designed with the web browser as its sole interface. Instead it will
be built as a library of functions (an API) that can be called from a
variety of sources--PHP scripts, Java applets, C/C++ applications,
and so on. This will make it much easier to integrate Maia into
other software packages (e.g. Squirrelmail, Horde Imp, cPanel,
webmin, Outlook, Eudora, Thunderbird, etc.). That said, a PHP-based
interface will still be provided with the distribution, but it will
be built on top of the API.
(4) Appliance considerations
There's a lot of interest in using Maia in an anti-spam/malware
appliance context, essentially combining all of the necessary
packages into a custom distribution that fits neatly on a CD with a
minimal operating system, so that a machine can be booted (or
preinstalled) with this CD to serve as a standalone appliance. A
number of the currently-popular "spam firewalls" out there, such as
the Barracuda product, are built around a SpamAssassin core, just as
Maia is, but lack much of Maia's advanced functionality. The
headless API design of Maia 2.x will make it easy to assemble
different kinds of distribution bundles--one for a standalone box,
another for an array scenario, etc.
(5) Support for both MX-redirects and POP/IMAP-redirects
To my surprise, some of the most interested Maia users have turned
out to be companies that do offsite filtering of e-mail for
downstream clients. Generally this is done by redirecting the MX
records for the client's domain to point to the filtering company,
and Maia works fine for this purpose. Another segment of this
market, though, is the individual customer who wants to have the
e-mail to his specific address filtered--something typically done by
redirecting his POP/IMAP account so that the filtering company
fetches his mail for him, filters it, then he picks up the mail from
the filtering company's POP/IMAP server instead. Maia can handle
this as well, with the use of something like fetchmail to get the
POP/IMAP mail and feed by SMTP to amavisd, but it's not something
Maia was designed to do, so some outside scripts and hacks are
required. With Maia 2.x, there will be administration tools to
handle this type of individual customer as well as domain-based
customers.
(6) Seek corporate sponsorship
An undertaking the size of Maia 2.x is non-trivial, and if I (and
possibly a small team of developers) are to be able to devote more
than just our spare time to this project, some sort of sponsorship
will be necessary. Companies interested in having their name and
branding associated with the Maia Mailguard project as sponsors have
the ability to garner some goodwill from the community and help fund
a full-time development effort. Work that might take a year of
spare-time effort could conceivably be finished in 4-6 months if
there were funding in place from sponsors. Interested parties should
feel free to contact me privately to discuss this further.
Prioritised commercial support options are also being considered, for
those businesses that need or want more than basic mailing list
support; this model seems to have worked well enough for the MySQL
developers.
To be clear, though, I want Maia 2.x to remain a free and open source
product. I may experiment at some point with differently-licensed
versions for different sorts of commercial users/applications (e.g.
an appliance license, a third-party filtering license, etc.), but a
free option (perhaps with branding required) should always remain.
--
"The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore, all
progress depends on the unreasonable man." -- George Bernard Shaw
Robert LeBlanc <rjl@renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.renaissoft.com/maia/>
_______________________________________________
Maia-users mailing list
Maia-users@renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
--
###
Robert Guerra <rguerra@privaterra.org>
Privaterra - <http://www.privaterra.org>
Reply to: