..how does the journalling system choose which blocks to work from?
What I've been able to see, the journal dies when their super blocks
go bad?
The filesystem needs the superblock in order to find the journal. If
you have a single gigantic filesystem mounted on /, then if the
primary superblock is corrupted, the kernel will not be able to mount
/, and you're hosed. E2fsck will automatically try the primary
superblock, and if that is corrupt, it will try the first backup
superblock. Failing that, a human will need to manually try one of
the other backup superblocks, if it is corrupted as well.
..this can be tuned to try more blocks before whining for manpower?
If your primary superblock is getting corrupted often, then first of
all, you should try to figure out why this is happening, and take
affirmative actions to prevent them. (The fact that you're reporting
marginal power is supremely suspicious; marginal power can cause disk
corruptions very easily. Getting higher quality power supplies will
help, but a UPS is the first thing I would get.)
..yeah, I'm working on the power bit. ;-)
Secondly, you're better off using a small root filesystem that
generally isn't modified often. What I normally do is use a 128 meg
root filesystem, with a separate /var partition (or /var symlinked to
/usr/var), and /tmp as a ram disk. With the root filesystem rarely
changing, it's much less likely that it will be corrupted due to
hardware problems. Then the root filesystem can come up, and e2fsck
can repair the other filesystems.
..yeah, except for /tmp on ramdisk, that's how I do my boxes,
and my isp business client is learning his lesson good. ;-)
But I repeat, your filesystems shouldn't be getting corrupted in the
first place. Using a separate root filesystem is a good idea, and
will help you recover from hardware problems, but your primary
priority should be to avoid the hardware problems in the first place.
- Ted