Maia Roadmap

To: debian-nonprofit@lists.debian.org
Subject: Maia Roadmap
From: Robert Guerra <rguerra@lists.privaterra.org>
Date: Sat, 4 Sep 2004 12:12:38 -0400
Message-id: <p06200a14bd5f992ed9cc@[192.168.3.13]>
In-reply-to: <5852868B65F75549A932442514644E100221470E@cubert.internal.mott.org>
References: <5852868B65F75549A932442514644E100221470E@cubert.internal.mott.org>

I'm forwarding the rccent message posted to the maia-mailguardmailing list that might be of interest to the linux users anddeveloppers on this list.

If you aren't aware of Maia-mailguard - you should be, as it's one ofthe best anti-spam solutions out there...


regards

Robert





Date: Sat, 04 Sep 2004 06:55:49 -0700
From: Robert LeBlanc <rjl@renaissoft.com>
Organization: Renaissoft, Inc.

It's been a while since I've sat down to put my thoughts on paperabout where Maia is headed, so I figured I'd take a little time nowto do just that.

In the near-term, of course, the goal is to get a "final" 1.0 versionof Maia out there, including support for Smarty templates and JpGraphcharts. This is coming along nicely, and we should be on track forthat release before the end of this month.

What I really want to talk about here, though, are the longer-termdesign goals for Maia. I'll define these as "mid-term" (1.x) and"long-term" (2.x).


Mid-Term Features:

(1) Automated site-specific DNSBL

With the addition of the rbldnsd package, some additional Perlscripts and code to extract the connecting peer's IP address frommail headers, it will be possible to maintain a database table ofoffending IP addresses. The database would store information aboutthe earliest connection attempt from a given IP address, the mostrecent connection attempt, and the total number of connectionattempts to date. From this data, a Perl script can create a DNSBLzone file at regular intervals for rbldnsd to use locally. Thislocal DNSBL can then be used either by your upstream MTA forblocking, or by SpamAssassin for scoring purposes.

Extra scripts would then allow you to do things like auto-expireentries from this database according to a configurable set ofcriteria--e.g. how long it's been since the last connection attemptfrom that IP address, how many pieces of confirmed spam/viruses havebeen received from that address, etc. A first-time offender, forinstance, might only stay on the list for 24 hours, whereas a secondoffense might extend its stay on the list for 3 days, a third offensewould extend it for 7 days, and so on, in an exponential progressionof some sort.

Additionally, that local DNSBL could be used to do more extensivekinds of blocking, based on the "severity level". Perhaps a sitethat has only been listed 1-10 times should only have mail servicedenied, but at 11+ offenses you might want to auto-add such addressesto a packet-level filter (e.g. IPTables, IPChains, etc.) to blockother protocols as well. With the information convenientlydatabased, scripts can access that data and do whatever you want withit.

Obviously this would also require some sort of IP whitelistingfacility as well, to ensure that certain "known-good" IP addressesare never listed under any circumstances.


(2) SpamCannibal integration

Following closely on the first feature (1), it's not a big step atall to (optionally) integrate Maia with the SpamCannibal tarpitnetwork <http://www.spamcannibal.org/>. SpamCannibal uses a similardatabase-driven list of IP addresses, but rather than outrightblocking of mail items it tarpits connections from listed addressesinstead (a.k.a. throttling, teergrubing, greylisting, etc.). Thebasic idea is that the connecting peer is slowed to a crawl by havingto send its mail in tiny pieces with lengthy delays in between, andno packet acknowledgements. This tactic has proven to be quiteeffective against DDoS attacks, and coincidentally works quite wellagainst spam and worm-spew from zombies (a.k.a. open proxies).

The SpamCannibal solution itself is just one implementation of thiskind of mechanism, and may not be the one that Maia ultimately mateswith, simply because it relies on the Linux IPTables facility to dothe work. This is an elegant solution, since it's MTA-independent,but it's also not particularly portable. I'm open to suggestions formore portable solutions, of course.

One of the key advantages to tarpitting is that it ties up thespammer's resources for hours, days, or even months. This preventsthe connecting peer from making repeated attempts (it still thinksthe first attempt is succeeding, albeit slowly), and limits theresources it has available with which to spam other people.

SpamCannibal maintains a network of shared data for its DNSBL,however, with SpamCannibal sites submitting new records to thisshared database and downloading current versions of the database atregular intervals. I had plans of my own of that sort, with the MaiaNetwork (see (3) below), but there's no reason Maia sites can't sharedata with the SpamCannibal network as well. This would provide yetanother collaborative spam-reporting option for sites running Maia.


(3) The Maia Network

One of the originally-stated design goals for Maia was to leveragethe data being gathered by other Maia users in such a way that thisdata could help others in their war on spam and malware. As we'veseen from services like Razor, Pyzor, DCC, and DNSBLs, collaborativenetworks are powerful things. Each of these is slightly differentfrom the others, either in terms of how the data is gathered or whattype of data is gathered. With the Maia Network I propose to addsomething new in both respects.

First of all, the Maia Network will serve as a source for aggregatestatistics, as gathered by all the participating Maia sites aroundthe world (participation will be optional, of course). From adata-mining perspective, this allows us to chart global and regionaltrends, and get a "big picture" view of what's going on out there.

Second, by sharing information from local Maia DNSBLs, a "masterdatabase" can be compiled and used to form a proper Maia DNSBL, whichcan then be consulted by participating Maia sites and used for MTAblocking or SpamAssassin scoring.

Third, and perhaps most ambitiously, by sharing tokenized data fromBayes databases, we have the ability to assemble a "master Bayesdatabase" that serves as an ever-growing corpus of spam and ham, butwithout privacy concerns. Since the data is tokenized before beinguploaded to the Maia Network, no actual e-mail is sent, just thehandful of tokens (words) that registered highly as ham or spam inyour Bayes database. Think of this as a site-wide Bayes taken to aneven higher level--a "global Bayes". This is effectively what theSpamAssassin folks use to do their scoring analyses (though in theircase they use the actual e-mails and run them through a "mass-check"script to extract the tokens). By just uploading the tokens, we savebandwidth and eliminate a serious privacy concern, while ending upwith the same data. This would let us do interesting things, such asauto-generate balanced scores for SpamAssassin rules, based on thespam and ham received by participating Maia sites. Those sites couldthen download new balanced score sets at regular intervals, withouthaving to wait months for the next official SpamAssassin release.

Fourth, the genetic algorithms used to do the score-balancingdescribed above can be distributed across participating Maia sites,in order to get the processing done in less time. The SpamAssassinfolks currently take up to four *weeks* to balance scores, due to theprocessing-intensive nature of the algorithms involved. That timecould be reduced considerably by parallelizing the operations anddistributing them. Having a supercomputing cluster handy for thistask would be nice, but it's hardly necessary, really--a few dozenmachines around the world that volunteer to contribute some sparecycles could do the job just as effectively. The more machines wehave participating, obviously, the more frequently we'll be able toissue new balanced score sets.


(4) Enhanced reporting options

If you've been following this list for any length of time, you'llrealize that I'm generally not content with solutions that merelyshield users from spam and malware. A lot of anti-spam andanti-virus packages do wonderful things to make the problem seem to"go away" for the people they protect, but do very little to attackthe problem itself. (A classic example of this kind of thinking isthe so-called "challenge/response" system, which shields users fromspam and worm-spew but at the cost of making the problem worse foreveryone else.)

This is why I'm fond of collaborative reporting networks like Razor,DCC, Pyzor, and SpamCannibal--by sharing data with these networks,others out there can benefit as well. Taking things a step further,though, it seems to me that we can be doing a lot more with the"evidence" that we're gathering in our quarantines and our spamtraps.We've gone to the trouble of collecting and classifying the mail,after all, so why shouldn't we get as much mileage as we can out ofit before we throw it away?

SpamCop comes to mind as one good place to file such reports,although reporting in bulk to SpamCop isn't free, and their freereporting service was severely rate-limited, last time I checked.Having said that, why couldn't Maia do the same kind of job itself?The bulk of the work is header analysis, trying to identify forgedheaders and the point of injection of the e-mail into the mailsystem, in order to determine who the appropriate abuse contact(s)are. The rest of the SpamCop system consists of a complaint-trackingmechanism with tie-ins to a DNSBL for offenders who choose to ignorethe complaints. Not difficult to implement with clickable URLs withhashed tokens that uniquely identify a complaint record in thedatabase, and a web form to let the abuse contact respond to thecomplaint (just as SpamCop does).



Long-Term Features:

In the longer term (i.e. Maia 2.x), I have it in mind to redesignMaia from the ground up as a tightly-integrated spam, malware, andcontent-filtering system.


(1) Replace amavisd-new with maiad

While the 1.x series is based on amavisd-new (and patches amavisd-newquite heavily), I expect that by 2.x I will have replaced amavisd-newwith something purpose-built for Maia's needs (e.g. "maiad"). To beclear, this is not because amavisd-new is at all a poor product, it'ssimply that its design goals are somewhat different than Maia's, andconsequently I've had to work with (or address) some of itslimitations to get it to do what Maia needs. Eventually the patchingeffort becomes too cumbersome to maintain, and it makes more sense tostart from scratch.


(2) Plug-in architecture

The eventual maiad will be designed with a plug-in architecture inmind, so that "filter modules" can be written by third parties andcombined by the end user as desired, much the same as Apache supportsmodules for additional functionality. The principle is simply thatmaiad is a mail processor--it receives mail from an upstream MTA,calls one or more filter modules, then passes the filtered result toa downstream MTA. The selection of filter modules, the order in whichthey're applied, and the logic that dictates the flow of mail throughthese filters (i.e. the finite-state machine) should all beconfigurable. Special "dummy filter" modules can add non-filterfunctionality as necessary.


(3) Headless API

Maia 2.x will also be essentially "headless"--that is, it will not bedesigned with the web browser as its sole interface. Instead it willbe built as a library of functions (an API) that can be called from avariety of sources--PHP scripts, Java applets, C/C++ applications,and so on. This will make it much easier to integrate Maia intoother software packages (e.g. Squirrelmail, Horde Imp, cPanel,webmin, Outlook, Eudora, Thunderbird, etc.). That said, a PHP-basedinterface will still be provided with the distribution, but it willbe built on top of the API.


(4) Appliance considerations

There's a lot of interest in using Maia in an anti-spam/malwareappliance context, essentially combining all of the necessarypackages into a custom distribution that fits neatly on a CD with aminimal operating system, so that a machine can be booted (orpreinstalled) with this CD to serve as a standalone appliance. Anumber of the currently-popular "spam firewalls" out there, such asthe Barracuda product, are built around a SpamAssassin core, just asMaia is, but lack much of Maia's advanced functionality. Theheadless API design of Maia 2.x will make it easy to assembledifferent kinds of distribution bundles--one for a standalone box,another for an array scenario, etc.


(5) Support for both MX-redirects and POP/IMAP-redirects

To my surprise, some of the most interested Maia users have turnedout to be companies that do offsite filtering of e-mail fordownstream clients. Generally this is done by redirecting the MXrecords for the client's domain to point to the filtering company,and Maia works fine for this purpose. Another segment of thismarket, though, is the individual customer who wants to have thee-mail to his specific address filtered--something typically done byredirecting his POP/IMAP account so that the filtering companyfetches his mail for him, filters it, then he picks up the mail fromthe filtering company's POP/IMAP server instead. Maia can handlethis as well, with the use of something like fetchmail to get thePOP/IMAP mail and feed by SMTP to amavisd, but it's not somethingMaia was designed to do, so some outside scripts and hacks arerequired. With Maia 2.x, there will be administration tools tohandle this type of individual customer as well as domain-basedcustomers.


(6) Seek corporate sponsorship

An undertaking the size of Maia 2.x is non-trivial, and if I (andpossibly a small team of developers) are to be able to devote morethan just our spare time to this project, some sort of sponsorshipwill be necessary. Companies interested in having their name andbranding associated with the Maia Mailguard project as sponsors havethe ability to garner some goodwill from the community and help funda full-time development effort. Work that might take a year ofspare-time effort could conceivably be finished in 4-6 months ifthere were funding in place from sponsors. Interested parties shouldfeel free to contact me privately to discuss this further.

Prioritised commercial support options are also being considered, forthose businesses that need or want more than basic mailing listsupport; this model seems to have worked well enough for the MySQLdevelopers.

To be clear, though, I want Maia 2.x to remain a free and open sourceproduct. I may experiment at some point with differently-licensedversions for different sorts of commercial users/applications (e.g.an appliance license, a third-party filtering license, etc.), but afree option (perhaps with branding required) should always remain.

--

"The reasonable man adapts himself to the world; the unreasonable onepersists in trying to adapt the world to himself. Therefore, allprogress depends on the unreasonable man." -- George Bernard Shaw


Robert LeBlanc <rjl@renaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.renaissoft.com/maia/>
_______________________________________________
Maia-users mailing list
Maia-users@renaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users

--
###
Robert Guerra <rguerra@privaterra.org>
Privaterra - <http://www.privaterra.org>

Reply to:

Next by Date: Debian-NP Logo
Next by thread: Debian-NP Logo
Index(es):
- Date
- Thread