Re: Parallelizing fetchmail
On May 21, 2006, at 12:52 PM, Daniele Cortesi wrote:
I recently uninstalled exim on my home pc, replacing it with esmtp
for outbound mail and fetchmail->procmail for inbound traffic.
Procmail checks every message for spam and viruses, introducing some
seconds of latency, mainly because of DNSRBL checks of spamc.
The disadvantage of this is that fetchmail launches only one procmail
for each message and waits for it termination. This leads to a very
delay when downloading many messages.
I can parallely check more than one message with spamd (making it
more childs) but I cannot find a configuration that will speed up with
more spamd-child. The bottleneck is always fetchmail that process
message one by one.
Have you got any ideas about how to insert a queue in the chain?
I can replace procmail with maildrop or similar if necessary. Please
avoid solutions like "re-install exim" or "install <insert your
favourite mta here>".
How much of a delay are you experiencing?
Are these messages all coming through one popbox?
Others may have better info, but I don't think you can run fetchmail
in parallel--at least not more than one process per user.
From "man fetchmail"
Only one daemon process is permitted per user; in daemon
mode, fetch-mail makes a per-user lockfile to guarantee this.
I do have two ideas though (N.B. may substitute IMAP for POP):
A: Set up multiple virtual users "fetchm_1,..., fetchm_n" all
fetching from the same popbox and run a daemon for each of them. I'd
be careful though--having multiple processes writing to the same mbox
files is probably asking for trouble.
B: Use intermediate popboxes as queues--essentially establishing a
1. fetchmail/procmail to distribute incoming messages to
multiple local popboxes (mailq_1, ..., mailq_n)
2. n fetchmail/spamc daemons running on each popbox to filter
spam (mailq_i -> mailq_nospam_i ) These will run in parallel.
3. 1 fetchmail/procmail daemon to collect and redistribute the
This approach will require a pop server on your local machine
as well as virtual users for each of the mail_q daemons.
I kinda like (B) because:
- The queues are explicit.
- Distribution can be configured as either a dumb dealer, or a
subject /priority sorter.
- The spam filtering can be scaled, or off-loaded to another machine.
- Distribution and collection processes are disjoint. They _could_
be performed by a single fetchmail daemon.
Of course these are just theoretical ruminations....do you feel lucky?