[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Best file system



On Sat, 04 Feb 2006, Craig Sanders wrote:
> i guess that depends on usage. i've had a number of power failures on
> my workstation boxes, even on my main workstation at home (which also
> doubles as mail server, web server, file server, etc etc etc), without
> getting corruption. i've lost a few files, but the file system was
> OK. disk usage on this machine is usually light, but it varies a lot

Barring kernel bugs, which I have seen only on 2.4.whatever, a LONG time ago
when we still had to patch kernels to get XFS, you will only get XFS
filesystem corruption if you have bad hardware (e.g. bad memory or disks).

XFS does quite well on recovering the filesystem metadata after a crash. You
get all inodes, directories, etc.  But some of them will have a bunch of
zeros instead of real data, and what pisses me off is that you have to
manually hunt down the files that were damaged this way.

Presumably, anything that makes very judicious use of fsync() where
appropriate will have less changes of corruption (the data loss window is
much shorter than the several seconds delayed write allocation can cause).
MTAs are known to get this right :)

> depending on what i've doing. i've got a UPS for it now, and can't
> remember the last time i had a kernel oops on it.

Same here on the production machines, but I have learned not to trust
kernels at all until after they ran for a few weeks in a machine that does
not have a shadow of XFS in it (just plain old reliable ext3, which doesn't
lose data) :-)

> for big servers, use a UPS (of course), and a raid controller with a
> large non-volatile write cache. having the journal log on a separate
> disk speeds up writes by avoiding seeks....perhaps even use a SSD or
> non-volatile ramdisk card for the journal.

Delayed writes in XFS makes all of the above useless.  It simply doesn't
send the data to the disk, and that makes the potential data loss window a
order of magnitude (or more) bigger than the time it takes to flush data to
the disk with such a setup.   Although it is very possible that one can tell
XFS to shorten or close the delayed-write window (I didn't check).

OTOH, if I am going to place the journal on non-volatile RAM, I'd go for
ext3 with data=journal.

> lot and to minimise the damage if/when it occurs. the faster the journal
> is committed to disk (or SSD or NV ramdisk), the less damage can be
> done.

Correct. But depending on the filesystem, that commiting can take much more
time than what you'd expect (unless you fsync(). If it delays a commit of
anything related to a fsync()ed file, that means the filesystem is utterly
broken).

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Reply to: