Re: reading an empty directory after reboot is very slow
Vincent Lefevre <email@example.com> writes:
>On 2015-04-13 16:28:27 -0600, Bob Proulx wrote:
>> Without dir_index an ext filesystem with large directories is slow due
>> to the linear nature of directories. But with dir_index it should be
>> using a B-tree data structure and should be much faster.
>So, why is it slow?
I don't think dir_index has anything to do with it. An index speeds up
lookups. You are not doing lookups; you are traversing the entire data
structure. A B-tree data structure can take longer to traverse than a
contiguous array data structure due to prefetching generally being
beneficial to arrays, but less so to pointer-based structures.
It's slow because every block of the directory needs to be read to get
the contents, even if every block contains empty entries. You don't know
that until you've read it.
>I also notice slowness with a large maildir directory:
>drwx------ 2 vlefevre vlefevre 8409088 2015-03-24 14:04:33 Mail/oldarc/cur/
>In this one, the files are real (145400 files), but I have a Perl
>script that basically reads the headers and it takes a lot of time
>(several dozens of minutes) after a reboot or dropping the caches
>as you suggested above. With a second run of this script, it just
>takes 8 seconds.
Your large directory is about 3.5 times the size of this one, so we
would expect all things being equal that it would take 30s to read the
larger directory based on the time of reading your maildir.
One thing that is likely not equal is fragmentation. It is quite
possible that your 30MB directory is fragmented across the disk and
involves many seeks to read it all. If you really want to know if this
is the case, use debugfs(8) to have a look:
# debugfs /dev/sda1 # sub sda1 with your device
debugfs: blocks /path/to/directory # path relative to root of filesystem
That will output all the blocks used by the directory, in the order of
the blocks in the directory. You'll be able to see how much seeking
would be needed to read those blocks linearly.
# debugfs /dev/mapper/m500-var
debugfs 1.42.5 (29-Jul-2012)
debugfs: blocks /lib/dpkg/info
8236 8207 8204 8221 8222 8223 8231 8232 8234 8333 8394 8395 8393 8396
8399 8400 8402 8747 8913 9258 9289 9311 9433 9405 9432 9452 9407 32084
32237 32238 32236 32245 32254 9555 9978 9908
You can see the blocks are reasonable contiguous until it jumps up to
the 32000's, and then back to the 9000's. If you see a lot of that in
your large empty directory, you'll find it slow to seek around the whole
lot. (In my case, that's on an SSD, so I don't care).