[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reading an empty directory after reboot is very slow



Quoting Vincent Lefevre (vincent@vinc17.net):
> On 2015-04-21 12:47:14 -0500, David Wright wrote:
> > Quoting Vincent Lefevre (vincent@vinc17.net):
> > > This mailbox is constantly open in a Mutt running in screen (in
> > > read-only mode). I often read it, and I modify it from time to time,
> > > either by adding new messages in the usual way, or by modifying some
> > > header of existing messages with some tool of mine (in which case, I
> > > restart Mutt to take the changes into account).
> > 
> > I guess I find it hard (not knowing what all these emails are) to get
> > my head around needing an email-style random access to 145k emails in
> > one folder.
> 
> I can then do any filtering I like to search for an old e-mail
> instead of having to look at several folders individually.

You have more of these folders with thousands of emails in them?

> > > When I wrote my tool, I thought that such a cached mapping would be
> > > useless because the mailbox would have to be read by Mutt anyway.
> > > So, there's still something I don't understand: after dropping the
> > > caches, why is Mutt fast to read the mailbox (about 1 minute), but
> > > not my tool (about 30 minutes)?
> > 
> > Because mutt caches, and its caches are persistent.
> > The default location (not in the man page, I think) is
> > ~/.mutt-cache/header-cache/
> > for the headers, and its parent for the emails themselves.
> 
> No, I don't use caches (the header cache is not enabled and the
> ~/.mutt-cache directory doesn't exist).

My fault; I had forgotten that I myself added these settings to .muttrc.
After all, that was probably in 1998.

In which case, if you want to know how come mutt is so fast, take a
look at the source. Just to mention one optimisation I would consider:
slurp the directory and sort the entries by inode. Open the files in
inode order.
And another: it's probably faster to slurp bigger chunks of each file
(with an intelligent guess of the best buffer size) and use a fast
search for \nMessage-ID rather than reading and checking line by line.

> > > > Have you considered running a local IMAP server to handle this (and
> > > > any other) maildir?
> > > 
> > > There would be other problems. All the tools would have to talk
> > > with this server... and for instance, mairix doesn't support IMAP.
> > 
> > Is this correct?
> > http://www.gnu.org/software/emacs/manual/html_node/gnus/Setting-up-mairix.html
> 
> This seems to be something different (nnmairix), which is not in Debian.
> For mairix:
> 
>  mairix is a program for indexing and searching locally stored email messages.
>  mairix supports Maildir, MH folders, and mbox formats.

nnmairix is a backend to mairix. It's here in wheezy and jessie.
usr/share/emacs/23.4/lisp/gnus/nnmairix.el.gz editors/emacs23-el
usr/share/emacs/23.4/lisp/gnus/nnmairix.elc editors/emacs23-common
usr/share/emacs/24.4/lisp/gnus/nnmairix.el.gz editors/emacs24-el
usr/share/emacs/24.4/lisp/gnus/nnmairix.elc editors/emacs24-common

> > But I don't understand why, if you're running mairix, you need to scan the
> > emails yourself. Hasn't mairix done this already? (As well as mutt.)
> 
> Only once for indexing. This is persistent after reboots and not
> used by Mutt at all.

No, I wasn't expecting mutt to use mairix. But I thought you might be
using it. Otherwise, why do you index them?

I also wondered what the problem would be with putting the thousands
of emails in a general purpose database. Don't they search and
retrieve faster than perl scripts?

Cheers,
David.


Reply to: