Re: real time content filtering - is there hope?
On Fri, May 26, 2006 at 09:10:04AM +0000, Andy Smith wrote:
> There's no such thing as a free lunch and content filtering is the
> most expensive part of accepting an email. If you want to do it
> during the SMTP conversation then this is going to slow down the
> rate at which you can accept and deal with mails, but it's probably
> still worth it.
well, no. the problem is bigger than just slowing down the rate your
system can accept mails, it actually makes mail delivery to your system
unreliable. if the content-filtering of each message takes too long
(i.e. because the system is too busy - perhaps because it has received a
large burst of email within a short time or perhaps because your server
is just very busy) then the SMTP session could time out while the sender
is waiting for a 2xx accept code or 5xx reject code from your server.
if this happens, then the sender will (if it is a properly configured
MTA) attempt to re-send the message later. if the cause of the slowness
was a large burst of email, then they will probably all suffer the same
timeout fate, and all be re-tried later - perhaps resulting in the same
timeout problems. i.e. you've just created a positive feedback loop
which almost guarantees excessive load on your mail server.
content filtering at the smtp stage would be nice, but nowhere near as
nice as a mail server that works all of the time rather than one that
only works some of the time. i.e. the risk ISN'T worth it. a properly
designed system MUST consider the worst-case-scenario and if it can't
cope with that then the design is either broken or inadequate. in this
case, broken....'part broken' or 'broken sometimes' is still just plain
IMO, it is better to accept the mail into the queue, scan it with
amavis/clamav/spamassassin/etc and either Tag+Deliver or Discard
detected spams/viruses (*do not* bounce as the sender address is most
i do both: tag+deliver if the SA score is lower than 10, discard if the
score is >=10 or if clamav detects a virus. in my experience, false
positives with a score over 10 are extremely rare.
ps: yes, you can reduce the chance of the above happening by using
extremely fast disks (e.g. ramdisk or solid-state disks) for the MTA
queue directory and for the temp directories used by amavisd/clamav/SA
and by having very fast CPUs. you can't, however, eliminate the risk
pps: i noticed recently that Gigabyte make a nice PCI card
solid-state-disk which can be populated with up to 4 GB of ram. it just
uses the PCI slot for power and plugs in as an IDE or SATA drive. it
has approx 16 hour battery backup. would make a very nice queue dir for
postfix or whatever, and only about $AUD400 plus about $AUD100 per GB
for the ram (at current australian prices). that's dirt cheap for a SSD.
personally, i'd use one of these for the postfix queue (where surviving
a power failure is important) and a ramdisk (linux tmpfs) for the amavis
etc temp directory (where surviving power-failure isn't).
craig sanders <firstname.lastname@example.org> (part time cyborg)