Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Mon, 8 Sep 2003 12:05:24 -0400,
Theodore Ts'o <email@example.com> wrote in message
> On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote:
> > > What happens on error conditions can be set through tune2fs or as
> > > a mount option. Having it remount read-only is probably better
> > > than panicing the kernel.
> > ..yeah, except in /var/log, /var/spool et al, I also lean towards
> > panic in /home.
> I tend to use remount read-only feature on desktops, where it's useful
> for me to be able to save my work on some other filesystem before I
> reboot my system.
..remount read-only is ok, as long as the bugle blows.
IME, it doesn't.
> But for an unattended server, most of the time it's probably better to
> force the system to reboot so you can restore service ASAP.
..even for raid-1 disks??? _Is_ there a combination of raid-1 and
journalling fs'es for linux that's ready for carrier grade service?
> > > When it happens a reboot may be a good idea, in which case a fsck
> > > to fix the problem should occur automatically.
> > ..should, agrrrRRRRRRRrrreed. IME (RH73 - RH9 and woody) it does
> > not.
> > ..what happens is the journaling dies, leaving a good fs intact,
> > on rebooting, the dead journal will "repair" the fs wiping good
> > data off the fs.
> I'm not sure what you mean by this. When there is a filesystem error
..add an "healthy" dose of irony to repair in "repair". ;-)
> detected, all writes to the filesystem are immediately aborted, which
...precludes reporting the error?
> means the filesystem on disk is left in an unstable state. (It my
> look consistent while the system is still running, but there is a lot
.._exactly_, but it is not reported to any of the system users.
A system reboot _is_ reported usefully to the system users, all
tty users get the news.
> of uncommitted data which has not been written out to disk.) So in
> general, not running the journal will leave you in a worse state after
> rebooting, compared to running the journal.
..it appears my experience disagrees with your expertize here.
With more data, I would have been able to advice intelligently
on when to and when not to run the journal, I believe we agree
not running the journal is adviceable if the system has been
left limping like this for a few hours.
> An alternative course of action, which we don't currently support
> would be to attempt to write everything to disk and quiesce the
> filesystem before remounting it read-only. The problem is that trying
> to flush everything out to disk might leave things in a worse state
> than just freezing all writes.
..could a ramdisk help? As in; store in ramdisk between journal
commits and honk the big horn on non-recoverable errors?
..and, on a raid-1 disk set, a failure oughtta cut off the one bad
fs and not shoot down the entire raid set because that one fs fails.
> The real problem is that in the face of filesystem corruption, by the
> time the filesystem notices that something is wrong, there may be
> significant damage that has already taken place. Some of it may
> already have been written to journal, in which case not replaying the
> journal might leave you with more data to recover; on the other hand,
> not replaying the journal could also risk leaving your filesystem very
> badly corrupted with data which the mail server had promised it had
> accepted, not actually getting saved by the filesystem.
> A human could make a read/write snapshot of the filesystem and try it
> both ways, but if you want automatic recovery, it's probably better to
> run the journal than not to run it.
..agreed, and with ext3 on a raid-1 set, this _oughtta_ be easy.
> > ..the errors=remount,ro fstab option remounts the fs ro but fails
> > to tell the system, so the system merrily "logs" data and "accepts"
> > mail etc 'till Dooms Day, and especially on raid-1 disks I sort of
> > expected redundancy, like in "autofeather the bad prop and trim out
> > the yaw" and "autopatch that holed fuel tank", and "auto-sync the
> > props", I mean, this was done _60_years_ ago in aviation to help
> > win WWII, and ext3 on raid-1 floats around USS Yorktown-style???
> If the system merrily logs data and accepts it, even after the
> filesystem is remounted read-only, that implies that the MTA is
> horribly buggy, not doing the most basic of error return code checks.
..agreed, pointer hints to such basic hints to such basics?
> If the filesystem is remounted read-only, then writes to the
> filesystem *will* return an error. If the application doesn't notice,
> then it's the application which is at fault, not ext3.
..on Woody, ext3 actually report the remount to /dev/console. ;-)
_Nothing_ elsewhere. Dunno about Red Hat, never had one hooked
to a monitor upon a journal failure.
..all I know is RH-7.3-8-9 and Woody does _not_ report ext3 journal
failures in any way I am aware of and can make use of, other than
these wee sad hints in dumpe2fs:
Filesystem revision #: 1 (dynamic)
has_journal filetype needs_recovery sparse_super
Filesystem state: clean with errors
Errors behavior: Continue
...and the cat /proc/mounts |grep " ro " output. Neither of these
warnings are made use of by the app or distro makers, AFAICT.
..sparse_super is IMNTHOAIME _not_ worth the saved disk space,
and should _not_ be the default setup option.
..180 days is IMNTHOAIME _much_ too long between fsck's. Reboots
defeats the point with /usr/bin/uptime and cause downtime, too.
..for corporate etc fsck's I lean towards Friday afternoon
and for isp's I lean towards Monday or Tuesday mornings.
..IMNTHO, the fsck is "major" enough to warrant its own runlevel,
on stand-alone file systems. I use runlevel 4 for maintenance on
my remaining RH boxes.
> That being said, my preference for servers is to panic immediately on
> the first sign of trouble, and let the system fsck and come back
> again. Even if your MTA is non-criminally-negligent, and checks error
..as luck has it, I have no isp MTA's running on "my" boxes,
I've only lost my own mail and logs. Stupid luck, I know.
> codes, the best it can do is return a SMTP temporary failure, which
> still doesn't keep the mail flowing. You're probably best off
> rebooting the machine and restoring service.
..even with raid-1 disk sets??? (Oh, I _buy_ your reboot
advice for stand-alone fs'es and anything less than raid-1.)
..half way thru a looong day now, spent on a stop gap gateway
that died on a, guess what. Started doing the final gateway.
..med vennlig hilsen = with Kind Regards from Arnt... ;-)
...with a number of polar bear hunters in his ancestry...
Scenarios always come in sets of three:
best case, worst case, and just in case.