Re: reading an empty directory after reboot is very slow

To: debian-user@lists.debian.org
Subject: Re: reading an empty directory after reboot is very slow
From: David Wright <deblis@lionunicorn.co.uk>
Date: Wed, 22 Apr 2015 23:28:46 -0500
Message-id: <[🔎] 20150423042845.GB4457@alum>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 20150422090326.GD20604@xvii.vinc17.org>
References: <[🔎] CAMLkfFRPFdSVDLa7-M1Jy8Q-uwtAfduwX6r6r4xJ8HkZ4oyOEg@mail.gmail.com> <[🔎] 20150413134103.GA9940@ypig.lip.ens-lyon.fr> <[🔎] 20150413161403464379672.NoCcsPlease@bob.proulx.com> <[🔎] 20150414130659.GB25575@ypig.lip.ens-lyon.fr> <[🔎] 20150415132315537464619.NoCcsPlease@bob.proulx.com> <[🔎] 20150420154421.GA17187@ypig.lip.ens-lyon.fr> <[🔎] 20150420205922.GC11709@alum> <[🔎] 20150421134821.GE18193@xvii.vinc17.org> <[🔎] 20150421174714.GB32498@alum> <[🔎] 20150422090326.GD20604@xvii.vinc17.org>

Quoting Vincent Lefevre (vincent@vinc17.net):
> On 2015-04-21 12:47:14 -0500, David Wright wrote:
> > Quoting Vincent Lefevre (vincent@vinc17.net):
> > > This mailbox is constantly open in a Mutt running in screen (in
> > > read-only mode). I often read it, and I modify it from time to time,
> > > either by adding new messages in the usual way, or by modifying some
> > > header of existing messages with some tool of mine (in which case, I
> > > restart Mutt to take the changes into account).
> > 
> > I guess I find it hard (not knowing what all these emails are) to get
> > my head around needing an email-style random access to 145k emails in
> > one folder.
> 
> I can then do any filtering I like to search for an old e-mail
> instead of having to look at several folders individually.

You have more of these folders with thousands of emails in them?

> > > When I wrote my tool, I thought that such a cached mapping would be
> > > useless because the mailbox would have to be read by Mutt anyway.
> > > So, there's still something I don't understand: after dropping the
> > > caches, why is Mutt fast to read the mailbox (about 1 minute), but
> > > not my tool (about 30 minutes)?
> > 
> > Because mutt caches, and its caches are persistent.
> > The default location (not in the man page, I think) is
> > ~/.mutt-cache/header-cache/
> > for the headers, and its parent for the emails themselves.
> 
> No, I don't use caches (the header cache is not enabled and the
> ~/.mutt-cache directory doesn't exist).

My fault; I had forgotten that I myself added these settings to .muttrc.
After all, that was probably in 1998.

In which case, if you want to know how come mutt is so fast, take a
look at the source. Just to mention one optimisation I would consider:
slurp the directory and sort the entries by inode. Open the files in
inode order.
And another: it's probably faster to slurp bigger chunks of each file
(with an intelligent guess of the best buffer size) and use a fast
search for \nMessage-ID rather than reading and checking line by line.

> > > > Have you considered running a local IMAP server to handle this (and
> > > > any other) maildir?
> > > 
> > > There would be other problems. All the tools would have to talk
> > > with this server... and for instance, mairix doesn't support IMAP.
> > 
> > Is this correct?
> > http://www.gnu.org/software/emacs/manual/html_node/gnus/Setting-up-mairix.html
> 
> This seems to be something different (nnmairix), which is not in Debian.
> For mairix:
> 
>  mairix is a program for indexing and searching locally stored email messages.
>  mairix supports Maildir, MH folders, and mbox formats.

nnmairix is a backend to mairix. It's here in wheezy and jessie.
usr/share/emacs/23.4/lisp/gnus/nnmairix.el.gz editors/emacs23-el
usr/share/emacs/23.4/lisp/gnus/nnmairix.elc editors/emacs23-common
usr/share/emacs/24.4/lisp/gnus/nnmairix.el.gz editors/emacs24-el
usr/share/emacs/24.4/lisp/gnus/nnmairix.elc editors/emacs24-common

> > But I don't understand why, if you're running mairix, you need to scan the
> > emails yourself. Hasn't mairix done this already? (As well as mutt.)
> 
> Only once for indexing. This is persistent after reboots and not
> used by Mutt at all.

No, I wasn't expecting mutt to use mairix. But I thought you might be
using it. Otherwise, why do you index them?

I also wondered what the problem would be with putting the thousands
of emails in a general purpose database. Don't they search and
retrieve faster than perl scripts?

Cheers,
David.

Reply to:

Follow-Ups:
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>

References:
- Re: reading an empty directory after reboot is very slow
  - From: Loïc Grenié <loic.grenie@gmail.com>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: Bob Proulx <bob@proulx.com>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: Bob Proulx <bob@proulx.com>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: David Wright <deblis@lionunicorn.co.uk>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: David Wright <deblis@lionunicorn.co.uk>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>

Prev by Date: Re: reading an empty directory after reboot is very slow
Next by Date: Re: reading an empty directory after reboot is very slow
Previous by thread: Re: reading an empty directory after reboot is very slow
Next by thread: Re: reading an empty directory after reboot is very slow
Index(es):
- Date
- Thread