[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: the correct way to read a big directory? Mutt?



On 2015-04-24 16:12:15 +0200, Nicolas George wrote:
> This is the corresponding entry in Mutt's ChangeLog:
> 
> 2005-01-27 18:45:37  Florian Weimer   <fw@deneb.enyo.de>  (roessler)

Yes, I eventually found this changeset too, by searching for "inode"
in Mutt's changelog.

> Reading the mailing-lists archives around that date may be interesting.

I've found that in my mail archive (thanks to "mairix inode"):

Date: Mon, 6 Oct 2003 10:04:04 +0200
From: Florian Weimer <fw@deneb.enyo.de>
To: mutt-dev@mutt.org
Subject: Maildir parsing optimization for ext3/Linux 2.6

Linux 2.6 adds hashed directory support to ext3.  As a result, readdir()
returns directory entries in a pretty wild order.  Opening files in
this order results in enormous seek overhead.  This overhead can be
reduced if the files are sorted by inode number first.

Without the patch below, mutt needs 200 seconds to open a maildir folder
with 18,500 messages which does not reside in the dentry/page cache.  If
the patch is applied, less than 15 seconds are required.  If the folder
is in cache, 3.1 vs. 3.3 seconds are needed.

If necessary, I can add some autoconf hackery to activate this
optimization only on Linux systems (where the d_ino field is always
available).

[...]

Date: Thu, 29 Nov 2007 12:35:29 +0100
From: Rocco Rutte <pdmef@gmx.net>
To: mutt-dev@mutt.org
Subject: [PATCH] Re-introduce inode sorting

Hi,

sorry that this is going to be longer, the subject is quite tricky.

The inode sorting patch was sent to mutt-dev in:

  http://does-not-exist.org/mail-archives/mutt-dev/msg00205.html

saying that it can speed it uncached maildir parsing from 200 to 15 seconds
making it more than 13 times faster, some concerns were raised and the code
disabled by default.

For #2981 I've been doing some tests on my gentoo box and can confirm this
factor roughly (10 minutes for >300k messages compared to far over 2 hours).
This almost renders mutt unusable for larger folders that aren't cached yet
(e.g.  an internal structure change requires a full cache rebuild or a user
just giving mutt a try).

Googling for some messages on the kernel mailing list turns out that not doing
inode sorting on dir_index-enabled ext3 filesystems effectively makes apps
access inodes in nearly random order, so that explains the slowdown. I think
mutt needs inode sorting.

[...]

> > Now I wonder whether the use of the hash by ext3 is a good idea...
> > 
> > Alternatively, I suppose that a SSD disk could improve things.
> 
> Well, filesystems can not be optimized for every use. Having myriads of
> small files has always been a bad idea anyway, it trashes the inode and
> dentries cache, it costs extra disk bandwidth (because you can not read half
> a sector at the end of the file) and latency (because of all the seeks, even
> when reading in order, it will be more fragmented than a single file), etc.

Yes, you already said that in the mutt-dev list in 2006. :-)
Nothing has changed. :-( Well, except...

> Of course, nowadays, huge RAM and SSD will mitigate the issue.
> 
> It is a tragedy that a standard, robust and efficient format for mailboxes
> was never designed and adopted.

I agree.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Reply to: