[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reading an empty directory after reboot is very slow

On 2015-04-14 23:01:19 -0000, Cam Hutchison wrote:
> I don't think dir_index has anything to do with it. An index speeds up
> lookups. You are not doing lookups; you are traversing the entire data
> structure. A B-tree data structure can take longer to traverse than a
> contiguous array data structure due to prefetching generally being
> beneficial to arrays, but less so to pointer-based structures.
> It's slow because every block of the directory needs to be read to get
> the contents, even if every block contains empty entries. You don't know
> that until you've read it.

I would have imagined that a B-tree data structure would simplify
when elements are removed. In fact, it isn't even a B-tree:

http://en.wikipedia.org/wiki/B-tree#Definition says:

  Every non-leaf node (except root) has at least ⌈m⁄2⌉ children.

So, there should be only a root node.

> >I also notice slowness with a large maildir directory:
> >drwx------ 2 vlefevre vlefevre 8409088 2015-03-24 14:04:33 Mail/oldarc/cur/
> >In this one, the files are real (145400 files), but I have a Perl
> >script that basically reads the headers and it takes a lot of time
> >(several dozens of minutes) after a reboot or dropping the caches

31 minutes

> >as you suggested above. With a second run of this script, it just
> >takes 8 seconds.
> Your large directory is about 3.5 times the size of this one, so we
> would expect all things being equal that it would take 30s to read the
> larger directory based on the time of reading your maildir.
> One thing that is likely not equal is fragmentation. It is quite
> possible that your 30MB directory is fragmented across the disk and
> involves many seeks to read it all. If you really want to know if this
> is the case, use debugfs(8) to have a look:
> # debugfs /dev/sda1  # sub sda1 with your device
> debugfs:  blocks /path/to/directory  # path relative to root of filesystem

The blocks are increasing and more or less contiguous. But since the
beginning of each file needs to be read, the time may be spent there.
That would be around 12 - 13 ms for each file. But

  ioping Mail/oldarc/cur

gives 162 us in average! But I'm wondering whether I need to add
options. Using -D is more interesting, but I don't know the meaning:

ypig:~> ioping -D Mail/oldarc/cur
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=1 time=11.8 ms
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=2 time=176 us
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=3 time=6.50 ms
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=4 time=6.09 ms
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=5 time=133 us
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=6 time=158 us
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=7 time=159 us
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=8 time=1.79 ms
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=9 time=1.90 ms
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=10 time=176 us
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=11 time=3.45 ms
4 KiB from Mail/oldarc/cur (ext3 /dev/sda1): request=12 time=158 us
--- Mail/oldarc/cur (ext3 /dev/sda1) ioping statistics ---
12 requests completed in 11.8 s, 369 iops, 1.44 MiB/s
min/avg/max/mdev = 133 us / 2.71 ms / 11.8 ms / 3.53 ms

Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Reply to: