Re: reading an empty directory after reboot is very slow
On 2015-04-22 23:28:46 -0500, David Wright wrote:
> Quoting Vincent Lefevre (email@example.com):
> > On 2015-04-21 12:47:14 -0500, David Wright wrote:
> > > Quoting Vincent Lefevre (firstname.lastname@example.org):
> > > > This mailbox is constantly open in a Mutt running in screen (in
> > > > read-only mode). I often read it, and I modify it from time to time,
> > > > either by adding new messages in the usual way, or by modifying some
> > > > header of existing messages with some tool of mine (in which case, I
> > > > restart Mutt to take the changes into account).
> > >
> > > I guess I find it hard (not knowing what all these emails are) to get
> > > my head around needing an email-style random access to 145k emails in
> > > one folder.
> > I can then do any filtering I like to search for an old e-mail
> > instead of having to look at several folders individually.
> You have more of these folders with thousands of emails in them?
Only one. This is the goal: store old e-mail at only one place. In the
past, I was using several folders, but this was impracticable because
for mail messages would naturally belong to several folders, and some
searches (such as getting all messages from/to some e-mail address)
needed a search in all folders. I've used tagging instead of folders
since 2003[*], and this is much more practical.
[*] At that time, this archive mailbox was much smaller, with
something like 20k messages.
> > No, I don't use caches (the header cache is not enabled and the
> > ~/.mutt-cache directory doesn't exist).
> My fault; I had forgotten that I myself added these settings to .muttrc.
> After all, that was probably in 1998.
> In which case, if you want to know how come mutt is so fast, take a
> look at the source. Just to mention one optimisation I would consider:
> slurp the directory and sort the entries by inode. Open the files in
> inode order.
> And another: it's probably faster to slurp bigger chunks of each file
> (with an intelligent guess of the best buffer size) and use a fast
> search for \nMessage-ID rather than reading and checking line by line.
I'll try to do some tests, but I would say that this doesn't matter
since what takes all the time is the I/O: when everything is in the
(Linux-level) cache, both Mutt and my tool take a few seconds.
> > Only once for indexing. This is persistent after reboots and not
> > used by Mutt at all.
> No, I wasn't expecting mutt to use mairix. But I thought you might be
> using it. Otherwise, why do you index them?
I use mairix when I need a "body" search first, otherwise such a
search would be awfully slow with Mutt. Then I can open the generated
folder with Mutt, and try to do more filtering to find what I want.
> I also wondered what the problem would be with putting the thousands
> of emails in a general purpose database.
But Mutt can't use such a database.
> Don't they search and retrieve faster than perl scripts?
I use Perl scripts only to update the mailbox (the tags), not for
Vincent Lefèvre <email@example.com> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)