[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reading an empty directory after reboot is very slow

On Tue, Apr 14, 2015, at 10:06, Vincent Lefevre wrote:
> > Without dir_index an ext filesystem with large directories is slow due
> > to the linear nature of directories.  But with dir_index it should be
> > using a B-tree data structure and should be much faster.
> So, why is it slow?

What kernel?  Here it is rather fast (3.10, ext4, enough RAM and a reasonably modern CPU).

If your kernel is new enough, switch that ext3 to ext4 even if you're not going to tune2fs and fsck it, and you will get some extra performance (but changed behavior, beware delayed allocation if your software is not well-behaved.  If it would cause problems with XFS, it will cause problems with ext4).

> I also notice slowness with a large maildir directory:
> drwx------ 2 vlefevre vlefevre 8409088 2015-03-24 14:04:33 Mail/oldarc/cur/
> In this one, the files are real (145400 files), but I have a Perl
> script that basically reads the headers and it takes a lot of time
> (several dozens of minutes) after a reboot or dropping the caches
> as you suggested above. With a second run of this script, it just
> takes 8 seconds.

It has to read not just the directory, but also all the inodes, which could be somewhat scattered all over the disk.  On top of that, it will also need to read some of the file data since you're actually opening and reading some of the contents.

I.e. if you do anything that involves stat()ing the directory entry, it has to read the inode as well.

Preload the directory and inode metadata with ls -lR >/dev/null before you run that perl script with cold caches, and you might get better performance as reading the file contents will not get in the way of batch-loading inodes and directory entries.

There is such as thing as the appropriate filesystem to the workload. ext3 is unlikely to be it nowadays, IME. And, of course, for anything where "decompress tarball to tmpfs, do the job here, tarball the results back to persistent fs (ext4 or XFS in my case), drop the tmpfs" is viable, I do just that.  Needs enough RAM, though.

  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique de Moraes Holschuh <hmh@debian.org>

Reply to: