Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
Arnt Karlsen wrote:
..and after a journal death, and fsck, the raid set will be able
to re-establish itself, no? Or does the journal do both/all disks
in a raid set?
The FS doesn't know or care about RAID-anything, as far as I know.
Doesn't the FS just tell /dev/hda1, /dev/sda1, or /dev/md1 to "write
this data to this block". Very oversimplified, I know, but it doesn't
seem like RAID should be part of the discussion here (aside from the
fact that a RAID1 or RAID5 config *may* reduce the occurance of problems
that would bring journaling into play).
..how does the journalling system choose which blocks to work from?
What I've been able to see, the journal dies when their super blocks
The filesystem needs the superblock in order to find the journal. If
you have a single gigantic filesystem mounted on /, then if the
primary superblock is corrupted, the kernel will not be able to mount
/, and you're hosed. E2fsck will automatically try the primary
superblock, and if that is corrupt, it will try the first backup
superblock. Failing that, a human will need to manually try one of
the other backup superblocks, if it is corrupted as well.
..this can be tuned to try more blocks before whining for manpower?
Ted will know a lot more about this than I do, but I'd think that if the
first two superblocks are corrupt, the likelihood of superblock number 3
or whatever being good is pretty low compared to the odds that the
drive/parition is shot. Perhaps that's why e2fsck just gives up on the
extra superblocks? Of course, then why bother including them?
I've had a bunch of Debian systems running on various (sometimes crappy)
hardware for years. I've seen very few cases where a superblock was
corrupt and e2fsck puked. In each case, it was on a drive that was old
enough that it wasn't worth fussing over any more, so I just replaced
the drive. Some of the drives are happy running on wintel boxes, others
are just paperweights.
If your primary superblock is getting corrupted often, then first of
all, you should try to figure out why this is happening, and take
affirmative actions to prevent them. (The fact that you're reporting
marginal power is supremely suspicious; marginal power can cause disk
corruptions very easily. Getting higher quality power supplies will
help, but a UPS is the first thing I would get.)
..yeah, I'm working on the power bit. ;-)
Secondly, you're better off using a small root filesystem that
generally isn't modified often. What I normally do is use a 128 meg
root filesystem, with a separate /var partition (or /var symlinked to
/usr/var), and /tmp as a ram disk. With the root filesystem rarely
changing, it's much less likely that it will be corrupted due to
hardware problems. Then the root filesystem can come up, and e2fsck
can repair the other filesystems.
..yeah, except for /tmp on ramdisk, that's how I do my boxes,
and my isp business client is learning his lesson good. ;-)
But I repeat, your filesystems shouldn't be getting corrupted in the
first place. Using a separate root filesystem is a good idea, and
will help you recover from hardware problems, but your primary
priority should be to avoid the hardware problems in the first place.
ETN Systems Inc.
2125 1st Ave East
Hibbing MN 55746