Re: Can we build a proper email cluster? (was: Re: Why is debian.org email so unreliable?)
On Wed, 13 Oct 2004 07:29, Henrique de Moraes Holschuh <hmh@debian.org> wrote:
> We have a lot of resources, why can't we invest some of them into a small
> three or four machine cluster to handle all debian email (MLs included),
A four machine cluster can be used for the entire email needs of a 500,000
user ISP. I really doubt that we need so much hardware.
> and tune the entire thing for the ground up just for that? And use it
> *only* for that? That would be enough for two MX, one ML expander and one
> extra machine for whatever else we need. Maybe more, but from two (master +
> murphy) two four optimized and exclusive-for-email machines should be a
> good start :)
I think that front-end MX machines is a bad idea in this environment. It
means that more work is required to correctly give 55x codes in response to
non-existent recipients (vitally important for list servers which will
receive huge volumes of mail to random-name@list-server and which should not
generate bounces for it).
We don't have the performance requirements that would require front-end MX
machines.
> colaborative work needs the MLs in tip-top shape, or it suffers a LOT. Way,
> way too many developers use @debian.org as their primary Debian contact
> address (usually the ONLY well-advertised one), and get out of the loop
> everytime master.d.o croaks.
OK, having a single dedicated mail server instead of a general machine like
master makes sense.
> One of the obvious things that come to mind is that we should have MX
> machines with very high disk throughput, of the kinds we need RAID 0 on top
> of RAID 1 to get. Proper HW RAID (defined as something as good as the
> Intel SCRU42X fully-fitted) would help, but even LVM+MD allied to proper
> SCSI U320 hardware would give us more than 120MB/s read throughput (I have
> done that).
U320 is not required. I don't believe that you can demonstrate any
performance difference between U160 and U320 for mail server use if you have
less than 10 disks on a cable. Having large numbers of disks on a cable
brings other issues, so I recommend a scheme that has only a single disk per
cable (S-ATA or Serial Attached SCSI).
RAID-0 on top of RAID-1 should not be required either. Hardware RAID-5 with a
NV-RAM log device should give all the performance that you require.
You will NEVER see 120MB/s read throughput on a properly configured mail
server that serves data for less than about 10,000,000 users! When I was
running the servers for 1,000,000 users there was a total of about 3M/s
(combined read and write) on each of the five back-end servers. A total of
15MB/s while each server had 4 * U160-15K disks (total of 20 * U160-15K
disks). The bottlenecks were all on seeks, nothing else mattered.
> Maybe *external* journals on the performance-critical filesystems would
> help (although data=journal makes that a *big* maybe for the spools, the
> logging on /var always benefit from an external journal). And in that case,
> we'd need obviously two IO-independent RAID arrays. That means at least 6
> discs, but all of them can be small disks.
http://www.umem.com/16GB_Battery_Backed_PCI_NVRAM.html
If you want to use external journals then use a umem device for it. The above
URL advertises NV-RAM devices with capacities up to 16G which run at 64bit
66MHz PCI speed. Such a device takes less space inside a PC than real disks,
produces less noise, has no moving parts (good for reliability) and has ZERO
seek time as well as massive throughput.
Put /var/spool on that as well as the external journal for the mail store and
your mail server should be decently fast!
> The other is to use a filesystem that copes very well with power failures,
> and tune it for spool work (IMHO a properly tunned ext3 would be best, as
> XFS has data integrity issues on crashes even if it is faster (and maybe
> the not-even-data=ordered XFS way of life IS the reason it is so fast). I
> don't know about ReiserFS 3, and ReiserFS 4 is too new to trust IMHO).
reiserfsck has a long history of not being able to fix all possible errors. A
corrupted ReiserFS file system can cause a kernel oops and this isn't
considered to be a serious issue.
ext3 is the safe bet for most Linux use. It is popular enough that you can
reasonably expect that bugs get found by someone else first, and the
developers have a good attitude towards what is a file system bug.
> The third is to not use LDAP for lookups, but rather cache them all in a
> local, exteremly fast DB (I hope we are already doing that!). That alone
> could get us a big speed increase on address resolution and rewriting,
> depending on how the MTA is configured.
I've run an ISP with more than 1,000,000 users with LDAP used for the
back-end. The way it worked was that mail came to front-end servers which
did LDAP lookups to determine which back-end server to deliver to. The
back-end servers did LDAP lookups to determine the directory to put the mail
in. When users checked mail via POP or IMAP Perdition did an LDAP lookup to
determine which back-end server to proxy the connection to, and then the
back-end server had Courier POP or IMAP do another LDAP lookup. It worked
fine with about 5 LDAP servers for 1,000,000 users.
As we have far fewer users we should be able to use a single LDAP server with
no performance issues. If there are LDAP performance issues then it
shouldn't be difficult to solve, I can offer advice on this if I am given
details of what's happening.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
--
Please respect the privacy of this mailing list.
Archive: file://master.debian.org/~debian/archive/debian-isp/
To UNSUBSCRIBE, use the web form at <http://db.debian.org/>.
Reply to: