Re: File system corruption

To: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Cc: John Tobey <spam@john-edwin-tobey.org>, debian-hurd@lists.debian.org
Subject: Re: File system corruption
From: Roland McGrath <roland@frob.com>
Date: Sun, 3 Oct 1999 16:51:29 -0400
Message-id: <[🔎] 199910032051.QAA01989@frob.com>
In-reply-to: Marcus Brinkmann's message of Sun, 3 October 1999 17:45:33 +0200 <[🔎] 19991003174533.L12483@ulysses.dhis.org>

> But then, I just saw that Roland has checked in a fix. I will upload fresh
> packages tonight, stay tuned.

A word about the changes I made last night.  

I did fix a bona fide bug in libstore (one introduced by my 1999-09-09
change).  This is a bug that would not show up when using a whole disk
partition, but only when using something like a noncontiguous file as the
store.  So, ironically, this is not the bug that has been biting people,
but if you made a test filesystem in a file to try to isolate that bug (as
I and others have recommended several times), then you are quite likely to
have hit this bug.  This libstore bug caused reading and writing of the
wrong disk blocks, so you would have effectively seen garbage on your disk
from it.  I am pretty sure that this libstore bug could not have affected
any filesystem using a whole disk partition.

I also checked in several changes to ext2fs.  The "bad type in directory"
problem's fix I already posted.  As I said then, I am pretty sure that this
bug could not have any bad effects other than producing that warning
message.

The other changes are just cleanups that might be slightly beneficial for
performance, but I did not find any bugs that they address.  (Incidentally,
I added support for the --sblock option to use an alternate superblock.)  

I did fix the `group_desc' function yesterday (and I think posted that fix
separately); it was indeed bogus for filesystem blocksizes other than 1k,
and responsible for the "no root inode" panic with such filesystems.  But,
again, I am pretty sure that this bug would only bite filesystem with a
blocksize other than 1k, and since it bit them totally with a panic on
startup, you could not have had this problem and not known it.

So, yes, I have fixed some actual bugs.  However, for anyone using a whole
disk partition who has seen a problem other than "bad type in directory" or
"panic: no root inode", then I have not fixed anything that explains any
other problems and I suspect they are still lurking there.  

Anyone trying to track down filesystem bugs should certainly update to the
new code, just to make it easier to keep track of things.  But I would
expect you to still see the same symptoms if they were not one of the
specific cases mentioned above.  When you do see a problem, it is important
to show me any messages you get as exactly as possible (magic numbers and
all).  Also, immediately after the crash, run `e2fsck -n' on your
filesystem and send me the whole output (if it says anything but "all ok"
or "clean flag not set"); it ought to work equally well to run e2fsck from
Hurd or Linux.  Another thing people might try is recompiling ext2fs with
-DEXT2FS_DEBUG, and then start it with -D (or use fsysopts /fs -D to
toggle); that should produce some (perhaps voluminous) debugging output
while ext2fs runs (I haven't tried this myself).  The debugging messages
that come out around the time of a crash or data corruption might help
me guess where the problem lies.

Reply to:

References:
- Re: File system corruption
  - From: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>

Prev by Date: Re: Some Hurd/GGI Q&A
Next by Date: Re: Porting KGI to the HURD
Previous by thread: Re: File system corruption
Next by thread: Re: EXT2FS Problems -- Interesting Code Snippet
Index(es):
- Date
- Thread