[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: A success story with apt and rsync



On Sun, Jul 06, 2003 at 11:36:34PM +0100, Andrew Suffield wrote:
> 
> I can only presume this is new or obscure, since everything I tried
> had the traditional behaviour. Can't see how to turn it on, either.
> 

It's new for 2.5.  Backports to 2.4 are available here:

	http://thunk.org/tytso/linux/extfs-2.4-update/extfs-update-2.4.21

For those who are interested, the broken out patches can be found here:

	http://thunk.org/tytso/linux/extfs-2.4-update/broken-out-2.4.21/to-apply

Once you have a htree-enabled kernel, you enable a filesystem to use
the feature by using the following command:

	tune2fs -O dir_index /dev/hdXX

Optionally, you can reorganize all of the directories to use btrees by
using the command "e2fsck -fD /dev/hdXX".  Otherwise, only directories
that are expanded beyond a single block after you set the dir_index
flag will use htrees.  The dir_index is a fully compatible extension,
so it's perfectly safe to mount a filesystem with htrees on a
non-htree kernel.  A non-htree kernel will just ignore the b-tree
information, and if it attempts to modify a hash-tree directory, it
will just invalidate the htree interior node information, so that the
directory becomes unindexed until e2fsck -fD is run over the
filesystem to which optmizes all of the directories by reindexing them
all.

Why would you want to use htrees?  Because they speed up large
directories.  A lot.  Try creating 400,000 zero-length files in a
single directory.  It will take under 30 seconds with htree enabled,
and well over an hour without.

> > The good news is that this particular optimization of sorting by inode
> > number should work for all filesystems, and should speed up xfs as
> > well as ext2/3 with HTREE.
> 
> What about ext[23] without htree? Mucking with the order returned by
> readdir() has historically caused problems there...

It'll be fine; in fact, in some cases you'll see a slight speed up.
The key is that you'll get the best performance by reading/modifying
the inode data structures in sorted order by inode number.  This way,
you make a single sweep through the inode table, without needing any
extraneous seeks.  Using the natural sort order of readdir() on
non-htree ext2/3 systems mostly approximated this --- although if
files are deleted and created from the directory, this is not
guaranteed.  So sorting by inode number will never hurt, and may help.

					- Ted



Reply to: