[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

Arnt Karlsen wrote:

..and after a journal death, and fsck, the raid set will be able to re-establish itself, no? Or does the journal do both/all disks in a raid set?

The FS doesn't know or care about RAID-anything, as far as I know. Doesn't the FS just tell /dev/hda1, /dev/sda1, or /dev/md1 to "write this data to this block". Very oversimplified, I know, but it doesn't seem like RAID should be part of the discussion here (aside from the fact that a RAID1 or RAID5 config *may* reduce the occurance of problems that would bring journaling into play).

..how does the journalling system choose which blocks to work from?
What I've been able to see, the journal dies when their super blocks
go bad?

The filesystem needs the superblock in order to find the journal.  If
you have a single gigantic filesystem mounted on /, then if the
primary superblock is corrupted, the kernel will not be able to mount
/, and you're hosed.  E2fsck will automatically try the primary
superblock, and if that is corrupt, it will try the first backup
superblock.  Failing that, a human will need to manually try one of
the other backup superblocks, if it is corrupted as well.

..this can be tuned to try more blocks before whining for manpower?

Ted will know a lot more about this than I do, but I'd think that if the first two superblocks are corrupt, the likelihood of superblock number 3 or whatever being good is pretty low compared to the odds that the drive/parition is shot. Perhaps that's why e2fsck just gives up on the extra superblocks? Of course, then why bother including them?

I've had a bunch of Debian systems running on various (sometimes crappy) hardware for years. I've seen very few cases where a superblock was corrupt and e2fsck puked. In each case, it was on a drive that was old enough that it wasn't worth fussing over any more, so I just replaced the drive. Some of the drives are happy running on wintel boxes, others are just paperweights.

If your primary superblock is getting corrupted often, then first of
all, you should try to figure out why this is happening, and take
affirmative actions to prevent them.  (The fact that you're reporting
marginal power is supremely suspicious; marginal power can cause disk
corruptions very easily.  Getting higher quality power supplies will
help, but a UPS is the first thing I would get.)

..yeah, I'm working on the power bit.  ;-)

Secondly, you're better off using a small root filesystem that
generally isn't modified often.  What I normally do is use a 128 meg
root filesystem, with a separate /var partition (or /var symlinked to
/usr/var), and /tmp as a ram disk.  With the root filesystem rarely
changing, it's much less likely that it will be corrupted due to
hardware problems.  Then the root filesystem can come up, and e2fsck
can repair the other filesystems.

..yeah, except for /tmp on ramdisk, that's how I do my boxes, and my isp business client is learning his lesson good. ;-)

But I repeat, your filesystems shouldn't be getting corrupted in the
first place.  Using a separate root filesystem is a good idea, and
will help you recover from hardware problems, but your primary
priority should be to avoid the hardware problems in the first place.

						- Ted



Rich Puhek
ETN Systems Inc.
2125 1st Ave East
Hibbing MN 55746

tel:   218.262.1130
email: rpuhek@etnsystems.com

Reply to: