[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Sarge Performance



Hi all

I just replaced my company anti-virus/anti-spam mail gateway from a
Redhat 7.3 with kernel 2.4.24 to Debian Sarge with kernel 2.6.8.1. I
had hoped that this transition would lead to better performance (new
perl, better drivers in the kernel and so on) but the performance has
instead drooped about 30%.

Here is my setup.

Hardware:
IBM 335 with dual 2.4 Ghz Xeon, 1GB Ram and 1 10.000RPM SCSI disk.


Software:

Minimal Debian Sarge (that is I have turned all unnecessary services off). 
Kernel 2.6.8.1-SMP.
Reiserfs on all file systems.
Qmail MTA configured with up to 70 incoming connections.
ClamAV running as a daemon.
10 Spamassassin daemons (spamd)
Qmail-scanner.

On my old Redhat systems the hardware could scan around 60.000 emails
pr. hour with an average scan time of 5.6 seconds (including time from
both ClamAV and Spamassassin) and average load of 23.7.

My new Sarge installation on the same hardware scans 40.000 emails pr
hour with an average scan time of 4.8 but with a load average of 57.8.

Interestingly if i time the internal handling of the email then Sarge
seams to win.

1) Unpacking the email with Ripmime is about 5% faster than the old
Redhat system.
2) Spam scanning the email is about 10% faster than the old Redhat system.
3) Perl handling of the email is about 2% faster.
4) ClamAv is scanning 4% faster.

(the above numbers is the average taken from 4 days of mail flow
(about 3.9 million emails))

This fact leads me to think that the system cannot handle as many
emails as before because it simply does not handle enough connections
(eg. the connections time out on the SMTP port before even getting to
the scanners)  - To persue this idea I have tried the folowing

1) Change the file system to XFS, EXT3. 
2) Running Reisefs with notail, nodiratime and noatime
3) Renice qmail-smtpd so it gets higher priority than spamd (hoping
that this would lead to more connections getting handled).
4) Change the I/O scheduler to deadline (elevator=deadline).
5) Changed the kernel to 2.4.
6) Turning the firewall (iptables) completely off.

Non of it has worked. And yes I do get 60.000 incoming connections pr.
hour most of them just seams to time out an get handled by the next MX
in my DNS.

To see if the server could take the load on its own I have tried 
changing my MX to only contain this one server. This made the load
jump to 98.9 and then the server eventually died with around 55
defunct perl process's floating around - my old Rehat server could
handle being the only mail server just fine (with loadavg around
28.5).

So as it is now I am a bit baffled by the slowness of Sarge, because
all the other systems I have converted to Sarge and kernel 2.6 have
run significantly faster  (Database servers, web servers, name
servers...).

So my question is this, does anyone know of any limitation in Sarge
(default values of incoming connections (not that I have ever heard of
such a thing)) that would cause my system to degrade in a way that it
has. When I do a telnet to port 25 I simply do not get a connection
fast enough (most of them times out) so this leads my to suspect that
something is wrong.



Regards.

Lars Roland



Reply to: