Sarge SMTP Performance
I just replaced my company anti-virus/anti-spam mail gateway from a
Redhat 7.3 with kernel 2.4.20-24.7smp to Debian Sarge with kernel
had hoped that this transition would lead to better performance (new
version of Perl, better drivers in the kernel and so on) but the performance has
instead drooped about 30%.
Here is my setup.
IBM 335 with dual 2.4 Ghz Xeon, 1GB Ram and 1 10.000RPM SCSI disk.
Minimal Debian Sarge (that is I have turned all unnecessary services off).
Reiserfs on all partitions (except /boot).
Qmail MTA configured with 70 incoming connections.
ClamAV running as a daemon.
10 Spamassassin daemons (spamd)
On my old Redhat system the hardware could scan around 60.000 emails
pr. hour with an average scan time of 5.6 seconds (including time from
both ClamAV and Spamassassin) and average load of 23.7.
My new Sarge installation on the same hardware scans 40.000 emails pr
hour with an average scan time of 4.8 but with a load average of 57.8.
Interestingly if i time the internal handling of the email then Sarge
seams to win (the numbers below is calculated from 4 days of mail flow
(about 3.9 million emails))
1) Spam scanning is about 18% faster than the old Redhat system.
2) Perl handling of the email is about 12% faster.
3) ClamAv is scanning 8% faster.
on the down side Sarge gets beaten in the following categories.
1) Unpacking email and attachments with Ripmime and unpackers (unzip,
unrar...) - this procedure used to average 0.075 seconds on my old
Redhat system - now the average is about 2 seconds (note that this can
drop if I renice the parent process responsible for calling the
unpackers but then other things start to take up time - usually
2) The number of connections that timeout on the SMTP service is 30%
higher than on the Redhat system
These numbers leads me to think that the system cannot handle as many
emails as before because it simply does not handle enough connections
(eg. the connections time out on the SMTP port before even getting to
the scanners) or because filesystem performance has dropped - To
persue this idea I have tried the folowing:
1) Change the file system to XFS, EXT3.
2) Running Reisefs with notail, nodiratime and noatime
3) Renice qmail-smtpd so it gets higher priority than spamd (hoping
that this would lead to more connections getting handled).
4) Change the I/O scheduler to deadline (elevator=deadline).
5) Changed the kernel to 2.4.27-i686-SMP.
6) Turning the firewall (iptables) completely off.
7) Tuning the TCP performance in accordance to the Linux TCP Tuning
Non of it has worked. And yes I do get 60.000 incoming connections pr.
hour most of them just seams to time out an get handled by the next MX
in my DNS.
Note that the DNS server I use is the same as the one used in the old
Redhat system and name resolution perform equally on both systems.
To see if the server could take the load on its own I have tried
changing my MX to only contain this one server. This made the load
jump to 98.9 and then the server eventually died with around 55
defunct perl process's floating around - my old Rehat server could
handle being the only mail server just fine (with loadavg around
So as it is now I am a bit baffled by the slowness of Sarge, because
all the other systems I have converted to Sarge and kernel 2.6 have
run significantly faster (Database servers, web servers, name
So my question is this, does anyone know of any limitation in Sarge
(default values of incoming connections (not that I have ever heard of
such a thing)) that would cause my system to degrade in a way that it
has. When I do a telnet to port 25 I simply do not get a connection
fast enough (most of them times out) so this leads my to suspect that
something is wrong.
Another solution could of cause be that the drivers in kernel 22.214.171.124
is buggier than the old ones in Redhat kernel 2.4.20-24.7smp - I have
still not investigated this fully.