[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Maia Roadmap



I'm forwarding the rccent message posted to the maia-mailguard mailing list that might be of interest to the linux users and developpers on this list.

If you aren't aware of Maia-mailguard - you should be, as it's one of the best anti-spam solutions out there...

regards

Robert





Date: Sat, 04 Sep 2004 06:55:49 -0700
From: Robert LeBlanc <rjl@renaissoft.com>
Organization: Renaissoft, Inc.

It's been a while since I've sat down to put my thoughts on paper about where Maia is headed, so I figured I'd take a little time now to do just that.

In the near-term, of course, the goal is to get a "final" 1.0 version of Maia out there, including support for Smarty templates and JpGraph charts. This is coming along nicely, and we should be on track for that release before the end of this month.

What I really want to talk about here, though, are the longer-term design goals for Maia. I'll define these as "mid-term" (1.x) and "long-term" (2.x).

Mid-Term Features:

(1) Automated site-specific DNSBL

With the addition of the rbldnsd package, some additional Perl scripts and code to extract the connecting peer's IP address from mail headers, it will be possible to maintain a database table of offending IP addresses. The database would store information about the earliest connection attempt from a given IP address, the most recent connection attempt, and the total number of connection attempts to date. From this data, a Perl script can create a DNSBL zone file at regular intervals for rbldnsd to use locally. This local DNSBL can then be used either by your upstream MTA for blocking, or by SpamAssassin for scoring purposes.

Extra scripts would then allow you to do things like auto-expire entries from this database according to a configurable set of criteria--e.g. how long it's been since the last connection attempt from that IP address, how many pieces of confirmed spam/viruses have been received from that address, etc. A first-time offender, for instance, might only stay on the list for 24 hours, whereas a second offense might extend its stay on the list for 3 days, a third offense would extend it for 7 days, and so on, in an exponential progression of some sort.

Additionally, that local DNSBL could be used to do more extensive kinds of blocking, based on the "severity level". Perhaps a site that has only been listed 1-10 times should only have mail service denied, but at 11+ offenses you might want to auto-add such addresses to a packet-level filter (e.g. IPTables, IPChains, etc.) to block other protocols as well. With the information conveniently databased, scripts can access that data and do whatever you want with it.

Obviously this would also require some sort of IP whitelisting facility as well, to ensure that certain "known-good" IP addresses are never listed under any circumstances.

(2) SpamCannibal integration

Following closely on the first feature (1), it's not a big step at all to (optionally) integrate Maia with the SpamCannibal tarpit network <http://www.spamcannibal.org/>. SpamCannibal uses a similar database-driven list of IP addresses, but rather than outright blocking of mail items it tarpits connections from listed addresses instead (a.k.a. throttling, teergrubing, greylisting, etc.). The basic idea is that the connecting peer is slowed to a crawl by having to send its mail in tiny pieces with lengthy delays in between, and no packet acknowledgements. This tactic has proven to be quite effective against DDoS attacks, and coincidentally works quite well against spam and worm-spew from zombies (a.k.a. open proxies).

The SpamCannibal solution itself is just one implementation of this kind of mechanism, and may not be the one that Maia ultimately mates with, simply because it relies on the Linux IPTables facility to do the work. This is an elegant solution, since it's MTA-independent, but it's also not particularly portable. I'm open to suggestions for more portable solutions, of course.

One of the key advantages to tarpitting is that it ties up the spammer's resources for hours, days, or even months. This prevents the connecting peer from making repeated attempts (it still thinks the first attempt is succeeding, albeit slowly), and limits the resources it has available with which to spam other people.

SpamCannibal maintains a network of shared data for its DNSBL, however, with SpamCannibal sites submitting new records to this shared database and downloading current versions of the database at regular intervals. I had plans of my own of that sort, with the Maia Network (see (3) below), but there's no reason Maia sites can't share data with the SpamCannibal network as well. This would provide yet another collaborative spam-reporting option for sites running Maia.

(3) The Maia Network

One of the originally-stated design goals for Maia was to leverage the data being gathered by other Maia users in such a way that this data could help others in their war on spam and malware. As we've seen from services like Razor, Pyzor, DCC, and DNSBLs, collaborative networks are powerful things. Each of these is slightly different from the others, either in terms of how the data is gathered or what type of data is gathered. With the Maia Network I propose to add something new in both respects.

First of all, the Maia Network will serve as a source for aggregate statistics, as gathered by all the participating Maia sites around the world (participation will be optional, of course). From a data-mining perspective, this allows us to chart global and regional trends, and get a "big picture" view of what's going on out there.

Second, by sharing information from local Maia DNSBLs, a "master database" can be compiled and used to form a proper Maia DNSBL, which can then be consulted by participating Maia sites and used for MTA blocking or SpamAssassin scoring.

Third, and perhaps most ambitiously, by sharing tokenized data from Bayes databases, we have the ability to assemble a "master Bayes database" that serves as an ever-growing corpus of spam and ham, but without privacy concerns. Since the data is tokenized before being uploaded to the Maia Network, no actual e-mail is sent, just the handful of tokens (words) that registered highly as ham or spam in your Bayes database. Think of this as a site-wide Bayes taken to an even higher level--a "global Bayes". This is effectively what the SpamAssassin folks use to do their scoring analyses (though in their case they use the actual e-mails and run them through a "mass-check" script to extract the tokens). By just uploading the tokens, we save bandwidth and eliminate a serious privacy concern, while ending up with the same data. This would let us do interesting things, such as auto-generate balanced scores for SpamAssassin rules, based on the spam and ham received by participating Maia sites. Those sites could then download new balanced score sets at regular intervals, without having to wait months for the next official SpamAssassin release.

Fourth, the genetic algorithms used to do the score-balancing described above can be distributed across participating Maia sites, in order to get the processing done in less time. The SpamAssassin folks currently take up to four *weeks* to balance scores, due to the processing-intensive nature of the algorithms involved. That time could be reduced considerably by parallelizing the operations and distributing them. Having a supercomputing cluster handy for this task would be nice, but it's hardly necessary, really--a few dozen machines around the world that volunteer to contribute some spare cycles could do the job just as effectively. The more machines we have participating, obviously, the more frequently we'll be able to issue new balanced score sets.

(4) Enhanced reporting options

If you've been following this list for any length of time, you'll realize that I'm generally not content with solutions that merely shield users from spam and malware. A lot of anti-spam and anti-virus packages do wonderful things to make the problem seem to "go away" for the people they protect, but do very little to attack the problem itself. (A classic example of this kind of thinking is the so-called "challenge/response" system, which shields users from spam and worm-spew but at the cost of making the problem worse for everyone else.)

This is why I'm fond of collaborative reporting networks like Razor, DCC, Pyzor, and SpamCannibal--by sharing data with these networks, others out there can benefit as well. Taking things a step further, though, it seems to me that we can be doing a lot more with the "evidence" that we're gathering in our quarantines and our spamtraps. We've gone to the trouble of collecting and classifying the mail, after all, so why shouldn't we get as much mileage as we can out of it before we throw it away?

SpamCop comes to mind as one good place to file such reports, although reporting in bulk to SpamCop isn't free, and their free reporting service was severely rate-limited, last time I checked. Having said that, why couldn't Maia do the same kind of job itself? The bulk of the work is header analysis, trying to identify forged headers and the point of injection of the e-mail into the mail system, in order to determine who the appropriate abuse contact(s) are. The rest of the SpamCop system consists of a complaint-tracking mechanism with tie-ins to a DNSBL for offenders who choose to ignore the complaints. Not difficult to implement with clickable URLs with hashed tokens that uniquely identify a complaint record in the database, and a web form to let the abuse contact respond to the complaint (just as SpamCop does).


Long-Term Features:

In the longer term (i.e. Maia 2.x), I have it in mind to redesign Maia from the ground up as a tightly-integrated spam, malware, and content-filtering system.

(1) Replace amavisd-new with maiad

While the 1.x series is based on amavisd-new (and patches amavisd-new quite heavily), I expect that by 2.x I will have replaced amavisd-new with something purpose-built for Maia's needs (e.g. "maiad"). To be clear, this is not because amavisd-new is at all a poor product, it's simply that its design goals are somewhat different than Maia's, and consequently I've had to work with (or address) some of its limitations to get it to do what Maia needs. Eventually the patching effort becomes too cumbersome to maintain, and it makes more sense to start from scratch.

(2) Plug-in architecture

The eventual maiad will be designed with a plug-in architecture in mind, so that "filter modules" can be written by third parties and combined by the end user as desired, much the same as Apache supports modules for additional functionality. The principle is simply that maiad is a mail processor--it receives mail from an upstream MTA, calls one or more filter modules, then passes the filtered result to a downstream MTA. The selection of filter modules, the order in which they're applied, and the logic that dictates the flow of mail through these filters (i.e. the finite-state machine) should all be configurable. Special "dummy filter" modules can add non-filter functionality as necessary.

(3) Headless API

Maia 2.x will also be essentially "headless"--that is, it will not be designed with the web browser as its sole interface. Instead it will be built as a library of functions (an API) that can be called from a variety of sources--PHP scripts, Java applets, C/C++ applications, and so on. This will make it much easier to integrate Maia into other software packages (e.g. Squirrelmail, Horde Imp, cPanel, webmin, Outlook, Eudora, Thunderbird, etc.). That said, a PHP-based interface will still be provided with the distribution, but it will be built on top of the API.

(4) Appliance considerations

There's a lot of interest in using Maia in an anti-spam/malware appliance context, essentially combining all of the necessary packages into a custom distribution that fits neatly on a CD with a minimal operating system, so that a machine can be booted (or preinstalled) with this CD to serve as a standalone appliance. A number of the currently-popular "spam firewalls" out there, such as the Barracuda product, are built around a SpamAssassin core, just as Maia is, but lack much of Maia's advanced functionality. The headless API design of Maia 2.x will make it easy to assemble different kinds of distribution bundles--one for a standalone box, another for an array scenario, etc.

(5) Support for both MX-redirects and POP/IMAP-redirects

To my surprise, some of the most interested Maia users have turned out to be companies that do offsite filtering of e-mail for downstream clients. Generally this is done by redirecting the MX records for the client's domain to point to the filtering company, and Maia works fine for this purpose. Another segment of this market, though, is the individual customer who wants to have the e-mail to his specific address filtered--something typically done by redirecting his POP/IMAP account so that the filtering company fetches his mail for him, filters it, then he picks up the mail from the filtering company's POP/IMAP server instead. Maia can handle this as well, with the use of something like fetchmail to get the POP/IMAP mail and feed by SMTP to amavisd, but it's not something Maia was designed to do, so some outside scripts and hacks are required. With Maia 2.x, there will be administration tools to handle this type of individual customer as well as domain-based customers.

(6) Seek corporate sponsorship

An undertaking the size of Maia 2.x is non-trivial, and if I (and possibly a small team of developers) are to be able to devote more than just our spare time to this project, some sort of sponsorship will be necessary. Companies interested in having their name and branding associated with the Maia Mailguard project as sponsors have the ability to garner some goodwill from the community and help fund a full-time development effort. Work that might take a year of spare-time effort could conceivably be finished in 4-6 months if there were funding in place from sponsors. Interested parties should feel free to contact me privately to discuss this further.

Prioritised commercial support options are also being considered, for those businesses that need or want more than basic mailing list support; this model seems to have worked well enough for the MySQL developers.

To be clear, though, I want Maia 2.x to remain a free and open source product. I may experiment at some point with differently-licensed versions for different sorts of commercial users/applications (e.g. an appliance license, a third-party filtering license, etc.), but a free option (perhaps with branding required) should always remain.


--
"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man." -- George Bernard Shaw

Robert LeBlanc <rjl@renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.renaissoft.com/maia/>
_______________________________________________
Maia-users mailing list
Maia-users@renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users

--
###
Robert Guerra <rguerra@privaterra.org>
Privaterra - <http://www.privaterra.org>



Reply to: