Re: reading an empty directory after reboot is very slow

To: debian-user@lists.debian.org
Subject: Re: reading an empty directory after reboot is very slow
From: Vincent Lefevre <vincent@vinc17.net>
Date: Thu, 23 Apr 2015 10:22:00 +0200
Message-id: <[🔎] 20150423082200.GB24263@xvii.vinc17.org>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 20150423042845.GB4457@alum>
References: <[🔎] 20150413134103.GA9940@ypig.lip.ens-lyon.fr> <[🔎] 20150413161403464379672.NoCcsPlease@bob.proulx.com> <[🔎] 20150414130659.GB25575@ypig.lip.ens-lyon.fr> <[🔎] 20150415132315537464619.NoCcsPlease@bob.proulx.com> <[🔎] 20150420154421.GA17187@ypig.lip.ens-lyon.fr> <[🔎] 20150420205922.GC11709@alum> <[🔎] 20150421134821.GE18193@xvii.vinc17.org> <[🔎] 20150421174714.GB32498@alum> <[🔎] 20150422090326.GD20604@xvii.vinc17.org> <[🔎] 20150423042845.GB4457@alum>

On 2015-04-22 23:28:46 -0500, David Wright wrote:
> Quoting Vincent Lefevre (vincent@vinc17.net):
> > On 2015-04-21 12:47:14 -0500, David Wright wrote:
> > > Quoting Vincent Lefevre (vincent@vinc17.net):
> > > > This mailbox is constantly open in a Mutt running in screen (in
> > > > read-only mode). I often read it, and I modify it from time to time,
> > > > either by adding new messages in the usual way, or by modifying some
> > > > header of existing messages with some tool of mine (in which case, I
> > > > restart Mutt to take the changes into account).
> > > 
> > > I guess I find it hard (not knowing what all these emails are) to get
> > > my head around needing an email-style random access to 145k emails in
> > > one folder.
> > 
> > I can then do any filtering I like to search for an old e-mail
> > instead of having to look at several folders individually.
> 
> You have more of these folders with thousands of emails in them?

Only one. This is the goal: store old e-mail at only one place. In the
past, I was using several folders, but this was impracticable because
for mail messages would naturally belong to several folders, and some
searches (such as getting all messages from/to some e-mail address)
needed a search in all folders. I've used tagging instead of folders
since 2003[*], and this is much more practical.

[*] At that time, this archive mailbox was much smaller, with
something like 20k messages.

> > No, I don't use caches (the header cache is not enabled and the
> > ~/.mutt-cache directory doesn't exist).
> 
> My fault; I had forgotten that I myself added these settings to .muttrc.
> After all, that was probably in 1998.
> 
> In which case, if you want to know how come mutt is so fast, take a
> look at the source. Just to mention one optimisation I would consider:
> slurp the directory and sort the entries by inode. Open the files in
> inode order.
> And another: it's probably faster to slurp bigger chunks of each file
> (with an intelligent guess of the best buffer size) and use a fast
> search for \nMessage-ID rather than reading and checking line by line.

I'll try to do some tests, but I would say that this doesn't matter
since what takes all the time is the I/O: when everything is in the
(Linux-level) cache, both Mutt and my tool take a few seconds.

> > Only once for indexing. This is persistent after reboots and not
> > used by Mutt at all.
> 
> No, I wasn't expecting mutt to use mairix. But I thought you might be
> using it. Otherwise, why do you index them?

I use mairix when I need a "body" search first, otherwise such a
search would be awfully slow with Mutt. Then I can open the generated
folder with Mutt, and try to do more filtering to find what I want.

> I also wondered what the problem would be with putting the thousands
> of emails in a general purpose database.

But Mutt can't use such a database.

> Don't they search and retrieve faster than perl scripts?

I use Perl scripts only to update the mailbox (the tags), not for
searching.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Reply to:

Follow-Ups:
- Re: reading an empty directory after reboot is very slow
  - From: David Wright <deblis@lionunicorn.co.uk>

References:
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: Bob Proulx <bob@proulx.com>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: Bob Proulx <bob@proulx.com>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: David Wright <deblis@lionunicorn.co.uk>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: David Wright <deblis@lionunicorn.co.uk>
- Re: reading an empty directory after reboot is very slow
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: reading an empty directory after reboot is very slow
  - From: David Wright <deblis@lionunicorn.co.uk>

Prev by Date: Re: /etc/network/interfaces in jessie and systemd?
Next by Date: Re: reading an empty directory after reboot is very slow
Previous by thread: Re: reading an empty directory after reboot is very slow
Next by thread: Re: reading an empty directory after reboot is very slow
Index(es):
- Date
- Thread