[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)



On 12/9/19 8:06 am, Zenaan Harkness wrote:
On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
Why is Gnu sieve so extremely fast to batch process an mbox file, but
while Dovecot's sieve-filter is an order of magnitude slower?

Sequence:

- mpop or getmail to pipeline download emails into temp mbox file
- filter that file


Have you tried piping the mail directly into the Dovecot lda so that it calls sieve.

There is so many moving parts to postfix and dovecot that it does my head in trying to figure out what I did everytime I look at it.

And for my virtual domains I have dovecot lmtp listening on tcp and that calls sieve.




Gnu sieve just flies through a local mbox file and saving emails to
other local mbox files.

Gnu sieve rejects too many emails with "malformed" errors, so after a
few years I bit the bullet and upgraded to Dovecot's sieve-filter.

Dovecot's sieve-filter, at present, is an order of magnitude slower.

Here's my filter command (one line):
tried
/usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions ~/etc/email/sieve.rc email-incoming-unsorted

The sieve script is fine now that I have the correct "require"
clauses (hint: "capability strings").

File ~/etc/email/sieve-dovecot-config.conf:

protocols = pop
lda_mailbox_autocreate = yes
lda_mailbox_autosubscribe = yes
mail_fsync = never

There's no re-sending of emails into my local Postfix SMTP server - I
checked the system logs and confirmed this (journalctl -f).

I suspect that Gnu sieve was directly writing each email to the
appropriate sieve-determined mbox file (perhaps with only a sync at
the end of a single batch process - what I've attempted to achieve
above with sieve-filter), and that sieve-filter is instead passing
each email through some (dovecot) lda?

Here's the output for a sieve-filter batch processing of 11 emails:

$ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions /home/zen/etc/email/sieve.rc email-incoming-unsorted
# PS0 Timestamp: 20190912@07:02:23
info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: VentureBeat: The death of disk? H...'. info: msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=OYDeDX4FMcoRdGdQ@mail.gmail.com>: stored mail into mailbox 'l/cp/cp'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] xattr naming format in Zo...'. info: msgid=<15675101930.d5ba2E.12322@composer.zfsonlinux.topicbox.com>: stored mail into mailbox 'l/z/zdev'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: [zfs-devel] xattr naming format i...'. info: msgid=<23955051567513749@sas1-02732547ccc0.qloud-c.yandex.net>: stored mail into mailbox 'l/z/zdev'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: [Gluster-users] Issues with Geo-r...'. info: msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=iGdajTSzkfQ5PCZsUfyg@mail.gmail.com>: stored mail into mailbox 'l/gl/user'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'. info: msgid=<[🔎] 20190903133420.GS6166@eeg.ccf.org>: stored mail into mailbox 'l/deb/user'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] `[awx-project] Re: AWX on Kubernetes m...'. info: msgid=<0715adb7-540f-4cff-9282-e1252c53c2e8@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] `[awx-project] Re: AWX on Kubernetes m...'. info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa17d@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: [zfs-devel] xattr naming format i...'. info: msgid=<CAB5c7xpHCdFx1w3yA9FyRL-KQ8BUiCr4JbiDQRuFJj9nOgKxTg@mail.gmail.com>: stored mail into mailbox 'l/z/zdev'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: [asterisk-users] Playing MP3's in...'. info: msgid=<20190903151022.354xpe6ds2vglher@red.localdomain>: stored mail into mailbox 'l/as/users'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: [Hyperledger Fabric] a primitive ...'. info: msgid=<160901d8-b903-9e9a-91ac-267571b0e24d@gmx.com>: stored mail into mailbox 'l/hl/fabric'.
info: message expunged from source mailbox upon successful move.
info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] `[awx-project] Re: AWX on Kubernetes m...'. info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffcc07@googlegroups.com>: stored mail into mailbox 'l/ansible/awx'.
info: message expunged from source mailbox upon successful move.
2 ▶︎️ zen@eye 20190912@07:02:30 ~ $


So about 3/4 of a second is spent by dovecot's sieve-filter, on each
email that it processes - watching it is painful given how fast Gnu
sieve has been for the last few years - it's almost (but not quite)
as slow as my previous fetchmail email download per-email time.

Attached is a -D debug run of sieve-filter on 20 emails - slightly
longer than the above, and took roughly 15 seconds to run.

Any help appreciated...

On another test run of ~600 emails, sieve-filter is consistently
running ~100% of one CPU (for about 4 minutes) to process these
emails, which leads to the conclusion that despite what looks like
should be a batch process, sieve-filter is perhaps reloading the
rules for every single email that it processes, even though I gave it
a whole mbox, and not a single email, to process.

Can sieve-filter work the way it should / the way I want it / batch
process a whole mbox - without reloading the sieve rules for every
email?

--
Warning: Do not look at laser with remaining eye.


Reply to: