reiserfs empirical study (very long)
[I am cross-posting this to the sparc list even though I haven't had
a chance to get this working on sparc yet. This is because 2.4
kernels are a bit harder to come by for sparc32 than they are for
powerpc right now. But someday in the near future, they might be
easier to get, and then this would be useful(?) information.]
There has been some FUD about reiserfs on this [powerpc] list, so I
thought it might be useful to gather some firsthand information and
experience and make it available.
First, there has been some confusion about reiserfsck and e2fsck.
The idea that reiserfsck doesn't work comes from the logic(?) that
it can't fix all possible and theoretical fs damage, so therefore it
doesn't work. This is true, but it is also true by the same logic
that e2fsck doesn't work. Or that they both work. Something that
is overlooked by this whole idea is that reiserfsck and e2fsck have
vastly different respective roles in relation to the main product.
e2fsck is a common and daily use utility which is an integral part
of ext2fs and similar UFS/FFS based file systems. Reiserfsck is an
adjunct and a last gasp that might fix some problems in the event of
a hypothetical bug that hypothetically might cause the normal
recovery mechanisms from being sufficient. If you find yourself in
a situation where the normal recovery mechanisms of reiserfs don't
work, the file system is most likely so fubared that reiserfsck
won't be able to do much. But it might. The point is that under
the same circumstances, e2fsck probably wouldn't be able to put
humpty back together again either. I would like to point out that
at no time in all the tests that I ran did reiserfsck run (in the
event that the log fails to replay, reiserfsck is run). Nor did
reiserfs suffer any strange problems like corruption or anything
else. BTW, reiserfs is not considered to be a more reliable file
system, but rather more resistant to crash damage than UFS based
* linux kernel 2.4.5-pre3 or something like that, from the famous
benh rsync kernel tree.
* endian patches for the kernel and reiserfsprogs
* Debian potato distribution
* Powermac 8500, 2 x 604e PowerPC processors
* various hard drives:
+ on external 53C94 scsi controller (5MB/s async)
1) an old 1.2GB seagate
2) a younger 2GB IBM
+ on internal MESH (10 MB/s sync)
1) a 4GB Fujitsu
All the performance test results were verified and repeatable within
about 1% deviation, unlike the test results on a certain web site
which had differences ranging from 50 to 100% for the same test(!),
rendering them completely useless.
The big endian patches change the code to use little endian ordering
for all on-disk structures. IMO this is a mistake, and certainly
costs a dear performance penalty, because on big endian processors,
this method requires converting endianness both ways (reading and
writing) for all meta data. I submit that there is little reason
for this, and the performance cost is not worth the very dubious
feature of having the file system be moveable to little endian
systems, like x86. Note that except in few cases, the disk labels
alone would prevent this. I would very much like to see some endian
patches that don't have this affect. I believe that the large file
I/O performance and large directory tree copy performance would show
a definite increase. It may be too late now.
It should be noted that 2.4 was extremely unstable. Random lockups
were common, at least one per 24 hours, and often just after getting
the login prompt after boot up. These lockups weren't related to
reiserfs (or ext2), because they occured regardless of whether
reiserfs.o was even loaded. In fact almost all the lockups occured
during periods of no activity.
In the area of large file creation and I/O, ext2 was the winner, not
by much, but repeatable and significant. This was backed up by
results from the bonnie disk benchmark, showing a slight but
significant edge to ext2fs. The bonnie benchmark is in many ways a
large file I/O test. Bonnie's disk I/O test using character I/O is
of no use because it's CPU bound, meaning the results are pretty
much the same for both file systems. This large file advantage can
be traced to the "extent" aspect of disk space allocation that is
very efficient. Reiserfs would do well to adopt something like it.
In the area of file creation and directory manipulation, reiserfs
won by a huge margin. Explicitly, after 2-3 hours, the "create 400k
files of size 0 in a directory" test caused the system to thrash
itself into uselessness on an ext2 file system, ultimately creating
less than half that many files.
In the area of copying a large directory tree using two tar
processes, it was almost a dead heat, but with reiserfs winning by a
toe nail clipping. This miniscule advantage was probably due to the
extremely fast performance on creation of a large number of small
files, which came in handly when copying include directories from
kernel source trees.
And last but most, the catastrophic failure test. As you might
expect, reiserfs kicked butt on this one. Actually, reiserfs was a
ray of sunshine because the system had a tendency to not be working
if left alone over night. Ext2 file systems suffered some serious,
messed up damage from these, requiring several libraries and other
core files to have to be reinstalled. If I was going to run 2.4.x
right now, I would probably make reiserfs the root file system for
that reason alone. One note however: when the cord is yanked from
a reiserfs file system, it causes the kernel to hang. Permanently.
Ext2 did not have this problem. A note will be sent to reiserfs
about that. Caution: don't try this test yourself. One of my disks
lost its low level formatting when doing this. Other severe
hardware failures to both drive(s) or computer(s) could result as
well. Don't try this at home!
Next installment: accurate depiction of Linux kernel development
politics and reiserfs.