[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..



On Mon, 8 Sep 2003 12:05:24 -0400, 
Theodore Ts'o <tytso@mit.edu> wrote in message 
<20030908160524.GA13324@think>:

> On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote:
> > > What happens on error conditions can be set through tune2fs or as
> > > a mount option.  Having it remount read-only is probably better
> > > than panicing the kernel.
> > 
> > ..yeah, except in /var/log, /var/spool et al, I also lean towards 
> > panic in /home.
> 
> I tend to use remount read-only feature on desktops, where it's useful
> for me to be able to save my work on some other filesystem before I
> reboot my system. 

..remount read-only is ok, as long as the bugle blows.  
IME, it doesn't.

> But for an unattended server, most of the time it's probably better to
> force the system to reboot so you can restore service ASAP.

..even for raid-1 disks???  _Is_ there a combination of raid-1 and 
journalling fs'es for linux that's ready for carrier grade service?

> > > When it happens a reboot may be a good idea, in which case a fsck
> > > to fix the problem should occur automatically.
> > 
> > ..should, agrrrRRRRRRRrrreed.  IME (RH73 - RH9 and woody) it does
> > not.
> > 
> > ..what happens is the journaling dies, leaving a good fs intact, 
> > on rebooting, the dead journal will "repair" the fs wiping good 
> > data off the fs.
> 
> I'm not sure what you mean by this.  When there is a filesystem error

..add an "healthy" dose of irony to repair in "repair".  ;-)

> detected, all writes to the filesystem are immediately aborted, which

...precludes reporting the error?  

> means the filesystem on disk is left in an unstable state.  (It my
> look consistent while the system is still running, but there is a lot

.._exactly_, but it is not reported to any of the system users.  
A system reboot _is_ reported usefully to the system users, all 
tty users get the news.

> of uncommitted data which has not been written out to disk.)  So in
> general, not running the journal will leave you in a worse state after
> rebooting, compared to running the journal.

..it appears my experience disagrees with your expertize here.
With more data, I would have been able to advice intelligently 
on when to and when not to run the journal, I believe we agree 
not running the journal is adviceable if the system has been 
left limping like this for a few hours.

> An alternative course of action, which we don't currently support
> would be to attempt to write everything to disk and quiesce the
> filesystem before remounting it read-only.  The problem is that trying
> to flush everything out to disk might leave things in a worse state
> than just freezing all writes.

..could a ramdisk help?  As in; store in ramdisk between journal 
commits and honk the big horn on non-recoverable errors?

..and, on a raid-1 disk set, a failure oughtta cut off the one bad 
fs and not shoot down the entire raid set because that one fs fails.

> The real problem is that in the face of filesystem corruption, by the
> time the filesystem notices that something is wrong, there may be
> significant damage that has already taken place.  Some of it may
> already have been written to journal, in which case not replaying the
> journal might leave you with more data to recover; on the other hand,
> not replaying the journal could also risk leaving your filesystem very
> badly corrupted with data which the mail server had promised it had
> accepted, not actually getting saved by the filesystem.
> 
> A human could make a read/write snapshot of the filesystem and try it
> both ways, but if you want automatic recovery, it's probably better to
> run the journal than not to run it.  

..agreed, and with ext3 on a raid-1 set, this _oughtta_ be easy.
 
> > ..the errors=remount,ro fstab option remounts the fs ro but fails 
> > to tell the system, so the system merrily "logs" data and "accepts" 
> > mail etc 'till Dooms Day, and especially on raid-1 disks I sort of 
> > expected redundancy, like in "autofeather the bad prop and trim out 
> > the yaw" and "autopatch that holed fuel tank", and "auto-sync the 
> > props", I mean, this was done _60_years_ ago in aviation to help 
> > win WWII, and ext3 on raid-1 floats around USS Yorktown-style???
> 
> If the system merrily logs data and accepts it, even after the
> filesystem is remounted read-only, that implies that the MTA is
> horribly buggy, not doing the most basic of error return code checks.

..agreed, pointer hints to such basic hints to such basics?

> If the filesystem is remounted read-only, then writes to the
> filesystem *will* return an error.  If the application doesn't notice,
> then it's the application which is at fault, not ext3.

..on Woody, ext3 actually report the remount to /dev/console.  ;-)
_Nothing_ elsewhere.  Dunno about Red Hat, never had one hooked 
to a monitor upon a journal failure. 

..all I know is RH-7.3-8-9 and Woody does _not_ report ext3 journal 
failures in any way I am aware of and can make use of, other than 
these wee sad hints in dumpe2fs:
Filesystem revision #:    1 (dynamic)
Filesystem features:      
has_journal filetype needs_recovery sparse_super
Filesystem state:         clean with errors
Errors behavior:          Continue

...and the cat /proc/mounts |grep " ro " output.  Neither of these 
warnings are made use of by the app or distro makers, AFAICT.

..sparse_super is IMNTHOAIME _not_ worth the saved disk space, 
and should _not_ be the default setup option.

..180 days is IMNTHOAIME _much_ too long between fsck's.  Reboots 
defeats the point with /usr/bin/uptime and cause downtime, too.

..for corporate etc fsck's I lean towards Friday afternoon 
and for isp's I lean towards Monday or Tuesday mornings.

..IMNTHO, the fsck is "major" enough to warrant its own runlevel, 
on stand-alone file systems.  I use runlevel 4 for maintenance on 
my remaining RH boxes.

> That being said, my preference for servers is to panic immediately on
> the first sign of trouble, and let the system fsck and come back
> again.  Even if your MTA is non-criminally-negligent, and checks error

..as luck has it, I have no isp MTA's running on "my" boxes, 
I've only lost my own mail and logs.   Stupid luck, I know.

> codes, the best it can do is return a SMTP temporary failure, which
> still doesn't keep the mail flowing.  You're probably best off
> rebooting the machine and restoring service.

..even with raid-1 disk sets???   (Oh, I _buy_ your reboot 
advice for stand-alone fs'es and anything less than raid-1.)

..half way thru a looong day now, spent on a stop gap gateway 
that died on a, guess what.  Started doing the final gateway.

-- 
..med vennlig hilsen = with Kind Regards from Arnt... ;-)
...with a number of polar bear hunters in his ancestry...
  Scenarios always come in sets of three: 
  best case, worst case, and just in case.



Reply to: