[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: XFS worth it?



On Sat, Apr 16, 2005 at 08:58:42AM +0200, Thomas Steffen wrote:
> What I have seen several times with XFS is that after a crash, files
> that were created just before the crash (like config files) are filled
> with zeros. It seems like the file is not flushed after close. If you
> try to get your 3D acceleration to work, changes are that it is you
> XF86Config-4 :-). It also "erased" my mozilla prefs.js on two
> occasions. It might be a coincidence, but it never happened to me on
> reiserfs.

I've seen that too, but only while I was having the other problems as
described in my previous message in this thread.  (My newfound
stability includes an absence of files containing nothing but zeros.)
Consider the following sequence, keeping in mind that I'm talking out
of my ass here (although many people have told me I'm a smart ass):

1) mozilla truncates the "old" file
2) mozilla writes the "new" file
   a) XFS writes the metadata allocating space for the data and
      indicating the size of the new file
   b) XFS writes the data to the allocated space

If the system crashes after (a) gets sync'd but before (b) gets
sync'd, the behavior you describe is a valid design choice.  I think
the problem you are describing is a "feature".

Keep in mind that journaling filesystems only guarantee filesystem
consistency.  Filesystem consistency is a necessary but not a
sufficient condition for data consistency.  If you want data
consistency from an application, it needs to behave in a way that
guarantees data consistency.  I haven't looked, but I'd bet you a
nontrivial sum of money that mozilla doesn't behave in a way that
guarantees data consistency of the prefs.js file.  (The proper
sequence is to write to a temp file, close the temp file, and then do
a rename() of the new temp file over the old file.)

Also, I think there is a flaw in your assumptions.  You are confusing
flushing with syncing.  Flushing just makes sure the data has been
handed off from the application to the kernel.  Syncing makes sure
that the kernel has handed off to the hardware.  (And depending on how
buffering is handled in the hardware, a power failure may still allow
data to be lost even if it has been sync'd.)  The distinction between
flushing and syncing exists because automatically syncing every time a
file is closed ruins your write-back caching performance.  It's left
to the application to decide if a sync is required or not.  Most apps
don't sync, and that's an appropriate decision.  DB servers, mail
servers, etc., *do* explicitly sync, which is also an appropriate
decision.  If you are about to do something which might crash the
system (like loading X with experimental settings), run /bin/sync
first.

All that said, this may not be a common feature of all journalling
filesystems.  (Or, probably more accurately, the window of opportunity
for experiencing this particular failure may be wider for XFS than for
some other filesystem.)  Those other filesystems may, however, have
other interesting failure modes.

Having defended the design choice to death, I will say that find it
really annoying that the file exists and is of non-trivial length,
just full of zeros.  That's a harder condition to detect than
presence/absence of the file, or the presence of a zero-length file.

> But I do wonder, because less and less people seem to use XFS, and SGI
> isn't doing so well either. Without them, XFS would probably decay
> pretty rapidly?

Why would it decay?  It isn't like bacteria attack code which remains
unmodified. :) It might fail to evolve, which in my experience is
really what the term "bitrot" describes.  (The rest of the world moves
forward in an incompatible way, while the "rotting" bits stand still.)
Combine an unmaintained nontrivial piece of code with the Linux
kernel, which has a history of moving forward in incompatible ways,
and hmmm, maybe your concern isn't misplaced.  Hmmm.  My gut feeling
is that there is significant community interest that someone would
pick up the slack, but that may just be wishful thinking.  At least
the sort of changes which would kill off a file system are infrequent,
and avoidable (lots of people still run 2.4, or even 2.2).

Brian
-- 
It is, of course, better to know useless
things than to know nothing.  -- Seneca



Reply to: