Re: reading an empty directory after reboot is very slow
Quoting Vincent Lefevre (email@example.com):
> On 2015-04-20 15:59:22 -0500, David Wright wrote:
> > Quoting Vincent Lefevre (firstname.lastname@example.org):
> > > Possibly, but individual modifications would take much more time than
> > > with Maildir (such modifications, consisting in retagging, occur from
> > > time to time).
> > I take it these are real emails being read etc with an email client
> > (like, say, mutt) at various times, rather than a dead archive of old
> > emails that you just happen to keep processing from time to time.
> This mailbox is constantly open in a Mutt running in screen (in
> read-only mode). I often read it, and I modify it from time to time,
> either by adding new messages in the usual way, or by modifying some
> header of existing messages with some tool of mine (in which case, I
> restart Mutt to take the changes into account).
I guess I find it hard (not knowing what all these emails are) to get
my head around needing an email-style random access to 145k emails in
> BTW, the best way would be to have this header in a different file,
> but Mutt has no way to support that. Alternatively, I could modify
> my tool to cache the Message-Id -> filename mapping, since this is
> what I actually need.
> When I wrote my tool, I thought that such a cached mapping would be
> useless because the mailbox would have to be read by Mutt anyway.
> So, there's still something I don't understand: after dropping the
> caches, why is Mutt fast to read the mailbox (about 1 minute), but
> not my tool (about 30 minutes)?
Because mutt caches, and its caches are persistent.
The default location (not in the man page, I think) is
for the headers, and its parent for the emails themselves.
Now, I'm taking this from a system with no maildir folders, so my
caches are for external emails where it caches *both* headers and
bodies. No need for the latter with maildirs of course.
> I meant that with the maildir format, an individual change just
> modifies the message file: this is very fast. With the mbox format,
> the whole file containing all the messages needs to be copied...
I agree, maildir is a much better choice than mbox.
> > Have you considered running a local IMAP server to handle this (and
> > any other) maildir?
> There would be other problems. All the tools would have to talk
> with this server... and for instance, mairix doesn't support IMAP.
Is this correct?
But I don't understand why, if you're running mairix, you need to scan the
emails yourself. Hasn't mairix done this already? (As well as mutt.)
> > Handling those volumes of email must be bread and butter to hosting
> > services. I assume such servers build persistent caches of the
> > emails rather than just depending on the filesystem's.
> Then this wouldn't solve the problem since the slowness I observe
> occurs only when the caches are empty (typically after a reboot).
> But actually, as I've said above, only with my tool, not with Mutt.
As said, mutt caches, so you seem to have at least two cache databases
for your emails already. Is there some way you could use those?