[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..



On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote:
> > What happens on error conditions can be set through tune2fs or as a
> > mount option.  Having it remount read-only is probably better than
> > panicing the kernel.
> 
> ..yeah, except in /var/log, /var/spool et al, I also lean towards 
> panic in /home.

I tend to use remount read-only feature on desktops, where it's useful
for me to be able to save my work on some other filesystem before I
reboot my system.  But for an unattended server, most of the time it's
probably better to force the system to reboot so you can restore
service ASAP.

> > When it happens a reboot may be a good idea, in which case a fsck to
> > fix the problem should occur automatically.
> 
> ..should, agrrrRRRRRRRrrreed.  IME (RH73 - RH9 and woody) it does not.
> 
> ..what happens is the journaling dies, leaving a good fs intact, 
> on rebooting, the dead journal will "repair" the fs wiping good 
> data off the fs.

I'm not sure what you mean by this.  When there is a filesystem error
detected, all writes to the filesystem are immediately aborted, which
means the filesystem on disk is left in an unstable state.  (It my
look consistent while the system is still running, but there is a lot
of uncommitted data which has not been written out to disk.)  So in
general, not running the journal will leave you in a worse state after
rebooting, compared to running the journal.

An alternative course of action, which we don't currently support
would be to attempt to write everything to disk and quiesce the
filesystem before remounting it read-only.  The problem is that trying
to flush everything out to disk might leave things in a worse state
than just freezing all writes.

The real problem is that in the face of filesystem corruption, by the
time the filesystem notices that something is wrong, there may be
significant damage that has already taken place.  Some of it may
already have been written to journal, in which case not replaying the
journal might leave you with more data to recover; on the other hand,
not replaying the journal could also risk leaving your filesystem very
badly corrupted with data which the mail server had promised it had
accepted, not actually getting saved by the filesystem.

A human could make a read/write snapshot of the filesystem and try it
both ways, but if you want automatic recovery, it's probably better to
run the journal than not to run it.  

> ..the errors=remount,ro fstab option remounts the fs ro but fails 
> to tell the system, so the system merrily "logs" data and "accepts" 
> mail etc 'till Dooms Day, and especially on raid-1 disks I sort of 
> expected redundancy, like in "autofeather the bad prop and trim out 
> the yaw" and "autopatch that holed fuel tank", and "auto-sync the 
> props", I mean, this was done _60_years_ ago in aviation to help 
> win WWII, and ext3 on raid-1 floats around USS Yorktown-style???

If the system merrily logs data and accepts it, even after the
filesystem is remounted read-only, that implies that the MTA is
horribly buggy, not doing the most basic of error return code checks.
If the filesystem is remounted read-only, then writes to the
filesystem *will* return an error.  If the application doesn't notice,
then it's the application which is at fault, not ext3.

That being said, my preference for servers is to panic immediately on
the first sign of trouble, and let the system fsck and come back
again.  Even if your MTA is non-criminally-negligent, and checks error
codes, the best it can do is return a SMTP temporary failure, which
still doesn't keep the mail flowing.  You're probably best off
rebooting the machine and restoring service.

						- Ted



Reply to: