[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Courier or Cyrus



On Tue, 15 Mar 2005, Michael Loftis wrote:
> >staff. We were using it for around 10k users, so it might work for you
> >with 400k, but you are going to need Cyrus gardners. We have found it

It will work fine with 30~50k users per server, if it is beefy enough
(REAAALY big ammount of RAM, and very fast IO).  If you use ridiculous
overpowered (usually non-Linux) boxes, that number could go much higher.  It
scales easily to more servers (in a flat IMAP namespace, even. I.e. you have
cluster-wide shared folders).

But don't try it with anything less than upstream 2.2, or Debian 2.1.17, if
you don't want trouble.

> >to be extremely cranky and in need of constant babysitting. The
> >various databases are often getting corrupted, causing mysterious or
> >non-existant errors which only become clear when using strace to walk

Looks like Cyrus 1.x or 2.0 or 2.1 *without* the extensive ammount of
patching that is in the Debian package, and using early versions of Debian's
(or upstream) berkeley DB 4.x.

Debian Cyrus 2.1 and upstream 2.2 plus a sane BDB 4.2 (like the one in
Debian sid/sarge) or BDB 3.2 (slower, but really mature and trouble-free in
Debian, at least) should not give you any trouble.

> >through the assembly calls. Quota information often just disappears or
> >gets corrupted. Because of cronic problems and bugs (like runaway or

I have seen reports of this, but not in Debian Cyrus 2.1.

> >halted processes which prevented any mail from getting delivered) we

I should try to kill -STOP a Cyrus lmtpd process during writes to see if it
still hangs the mailbox that was being written to, though.  A whole lot of
design changes and extra resilience code was added that tries to avoid these
issues to Debian 2.1, and even more of it to upstream 2.2.

> >had to turn off the 'features' of squat indexes and duplicate delivery
> >prevention. The program that is supposed to "repair" your broken stuff
> >is actually a no-op and nobody knows why, nor does anyone appear
> >interested in fixing it.

Fixing squat is damn easy, just remove the indexes with find and/or
regenerate the indexes.  The duplicate delivery database is easy too (just
delete it, or run the usual berkeley DB recovery tools).  I *do* wish
someone would rewrite that damn squat code to be resilient against
corruption, but I am not touching that one.

Not that you need squat unless your users like 10000+ mails per folder, at
which point courrier-imap should give you crappy performance due to the lack
of indexes AFAIK, and you might need Cyrus anyway for that reason alone.

The mailbox database can be backup-dumped to cleartext at any time, and
regenerated from that without any fuss.  In berkeley DB mode, backups of the
database are made at every checkpoint (for a large number of users, you
better checkpoint every 5 minutes or so, or face long BDB startup times).

The mailbox repair tool (reconstruct) works just fine and can be fired
remotely through IMAP.  All it does not do anymore is to try to regenerate
the mailbox database I think, and that I have covered above.

The quota database can be rebuilt (and in fact, I do it regularly from cron
where possible, just in case -- I do not trust the quota code in 2.1 that
much).

Seen databases are a problem. IF they get corrupted you will probably have
to resort to losing some seen-flag data to fix it.  the mailbox repair tool
will do it automatically AFAIK, except in very few cases (in 2.1. I sure
hope there are none left in 2.2) where you have to manually delete the damn
indexes for the repair tool to work.

> million emails/day.  The only issue we've run into is an occasional 
> deadlock with POP3 that requires rebooting.  And that happens once every 
> few months, if that.  This is with 2.1.17.

The Debian packages should fix that POP3 deadlock possibility, I think.
Something else that can cause POP3 deadlocks is apop (enabled by default),
if your box runs out of /dev/random entropy.  Just disable apop if you don't
need it.

But a deadlock that requires a reboot? Don't you mean a Cyrus restart? If it
requires a _reboot_, something is very wrong indeed and I had never heard of
anythink like this.

> Most of our problems go back to incoherent clients, mostly Outlook, but 
> occasionally Thunderbird -- turn on it's junk mail filtering, file to say 

Amem to that.  I think Cyrus 2.2 and Debian Cyrus 2.1 can handle most
outlooketies just fine, but I would not bet anything on it.  One must always
be on the look out for any new version of OutLook and all the bugs that it
brings with it...

> >entire point of maildirs). For example, maildir makes backing up and
> >restoring ranges of email very easy.

Cyrus does that too, since it is just MH + indexes.  You just reload from
backup media and run a reconstruct in that single mailbox, and incremental
backups work just fine (the message data is never changed by Cyrus once in
the spool, just the indexes).

Hmm, come to think about it, maildir is probably superior in that it stores
the message flags in-band which is nicer for restores -- cyrus will mark all
restored messages as unread and lose all other flags on that message as
well).  OTOH, how does one manage to have annotations and per-user flags
(such as seen state) using maildir?

> There is one serious problem.  He'll need multiple servers since the unix 
> UID limit hits at 65535.  So you can only get about 64k users created per 

Huh?  More like 2^31 in Debian Linux 2.4/2.6 kernels.  But over 64K users
per box will mean bad performance unless these servers are really something
else, or your users are not that much active to begin with, so I don't think
this counts against courrier-imap.

> I've yet to find a good way to backup and restore cyrus mail...basically 
> it's a pain to do.  Thankfully we've only ever very rarely had a need to 
> restore mail.

There is a lot of stuff about it on the Cyrus wiki and ML archives.  I too
rarely need to restore mail, and simple amanda incremental backups work just
fine for that (the effort it takes on restores is caused by amanda, not
Cyrus).

> With courier I dont' know if it has any ability like the MURDER system in 
> Cyrus that allows you to create a scaleable (not redundant) cluster and 

Redundancy using murder is coming soon, I think.  So far, people do the
usual HA things for redundancy (but don't even think about Cyrus on top of
NFS :P).

> have the system automatically route mail internally to the correct mail 
> store system.  You can do the same with LDAP and postfix though with Cyrus. 
> and say something like perdition.

AS LONG AS you do not need cluster-wide shared mailboxes, at which point a
Cyrus murder cluster is the only thing that will work, AFAIK (and I would
love to hear otherwise!).

> That's a LOT of mail.  You could easily need half a dozen very beefy boxes 
> to handle that much mail depending on how much spam/virus/etc features you 
> want.

Indeed. SPAM and AV processing for this would require a small cluster of
workers of its own, with good local IO and lots of CPU power (and a good
enough ammount of RAM that you can use for IO caching).

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Reply to: