[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Parallelizing fetchmail




On May 21, 2006, at 12:52 PM, Daniele Cortesi wrote:

Hello *,
 I recently uninstalled exim on my home pc, replacing it with esmtp
for outbound mail and fetchmail->procmail for inbound traffic.

Procmail checks every message for spam and viruses, introducing some
seconds of latency, mainly because of DNSRBL checks of spamc.

The disadvantage of this is that fetchmail launches only one procmail
for each message and waits for it termination. This leads to a very long
delay when downloading many messages.

I can parallely check more than one message with spamd (making it create
more childs) but I cannot find a configuration that will speed up with
more spamd-child. The bottleneck is always fetchmail that process every
message one by one.

Have you got any ideas about how to insert a queue in the chain?

I can replace procmail with maildrop or similar if necessary. Please
avoid solutions like "re-install exim" or "install <insert your
favourite mta here>".

How much of a delay are you experiencing?
Are these messages all coming through one popbox?

Others may have better info, but I don't think you can run fetchmail in parallel--at least not more than one process per user.
From "man fetchmail"

Only one daemon process is permitted per user; in daemon mode, fetch-mail makes a per-user lockfile to guarantee this.

I do have two ideas though (N.B. may substitute IMAP for POP):
A: Set up multiple virtual users "fetchm_1,..., fetchm_n" all fetching from the same popbox and run a daemon for each of them. I'd be careful though--having multiple processes writing to the same mbox files is probably asking for trouble.

B: Use intermediate popboxes as queues--essentially establishing a multi-stage dataflow: 1. fetchmail/procmail to distribute incoming messages to multiple local popboxes (mailq_1, ..., mailq_n) 2. n fetchmail/spamc daemons running on each popbox to filter spam (mailq_i -> mailq_nospam_i ) These will run in parallel. 3. 1 fetchmail/procmail daemon to collect and redistribute the messages. This approach will require a pop server on your local machine as well as virtual users for each of the mail_q daemons.

I kinda like (B) because:
 - The queues are explicit.
- Distribution can be configured as either a dumb dealer, or a subject /priority sorter.
 - The spam filtering can be scaled, or off-loaded to another machine.
- Distribution and collection processes are disjoint. They _could_ be performed by a single fetchmail daemon.

Of course these are just theoretical ruminations....do you feel lucky?

--rich







Reply to: