[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Backup from huge Maildirs

On Tue, Sep 04, 2007 at 09:02:53AM +0200, Marc Haber wrote:
> On Mon, Sep 03, 2007 at 08:27:17PM -0300, Henrique de Moraes Holschuh wrote:
> > On Mon, 03 Sep 2007, Marc Haber wrote:
> > > On Mon, Sep 03, 2007 at 07:13:28PM +0200, Oliver Hitz wrote:
> > > > ext3 is
> > > > much slower than reiserfs if you're handling directories containing huge
> > > > numbers of files.
> > > 
> > > Does this still hold for ext3 filesystems with directory hash enabled?
> > 
> > Yes, when you need to *enumerate* all files in a dir.  The hash trees make
> > it even worse, I think.
> Where is the advantage of reiserfs then?

not every filesystem has abysmal performance when you have many
thousands of files in one directory. ext2 and ext3 perform quite badly
in this situation.

two file systems that perform well are reiserfs and XFS.

i've used reiserfs in the past, but ended up having too many problems
with it (although it has improved since then). in short, i don't trust
it and see no compelling reason to try it again since XFS does what i

i use XFS as my "standard" fs these days. it's a good solid fs, with
many years of real-world usage and testing on SGI boxes even before it
was GPL-ed and ported to linux several years ago. highly reliable, and
fast. performs well for both extremely large files (e.g. video editing)
and many small files.

i have no idea how either would perform with millions of files in the
one directory, but i know they both perform well with many thousands
(e.g. Maildir and news spools).

to the OP: are they really all in the one directory, or are they in
multiple subdirectories/folders beneath that dir?

try an experiment.  create three filesystems (you could temporarily re-use
your swap partition if you don't have any spare partitions on your system),
formatted as ext2/ext3, reiserfs, and xfs.  time the creation, listing, and
deletion of, say, 10000 small files in a directory on each fs.

ext2/ext3 will be horribly slow. reisers and XFS will both be fast, with
reiserfs probably being slightly faster than XFS (unless you forget to
disable the default 'file tails' option of reiserfs, which will save
space but slow it down).

so, switching to a better fs will solve part of the problem....but if
you're rsyncing ~60 user directories, each with over a million files
in them then that's going to take time. rsync builds up a list of
changed/added/deleted files before copying anything, which means it
needs to stat all of the files on both systems.

you may be better off modifying user behaviour, perhaps by having two
mail systems - the current system and an archive system for old mail,
and backup the archive system to tape using amanda (e.g. with a schedule
of weekly full and daily incremental backups). "current" mail (e.g. less
than a month old) can be access instantly as normal. older mail would
require accessing the archive system.


craig sanders <cas@taz.net.au>

Reply to: