[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: fsck'd - FIXED



On 3/7/08, John Fleming <wa9als@gmail.com> wrote:
> On 3/7/08, Douglas A. Tutty <dtutty@porchlight.ca> wrote:
> > On Fri, Mar 07, 2008 at 08:29:08AM -0500, John Fleming wrote:
> > > Backgroud - I had a well-established LAMP server that was giving some
> > > filesystem errors on boot, with the "hit control-D to continue or give root
> > > password to fix manually" message. It would go ahead and work normally if I
> > > hit Control-D.
> >
> > You should have fixed it manually.  Control-D is in case the person
> > sitting there doesn't have the root password and is generic to Debian's
> > single-user mode.
> >
> > > However, I wanted to try to get rid of the error and need for human
> > > intervention in the event of the need for a remote reboot, so I tried
> > > to fix the errors with fsck.  Somehow I ended up with a badly trashed
> > > filesystem and inability to reboot.
> > >
> > > After much knashing of teeth and consideration of my options, I
> > > installed a new etch system.  I installed several benign packages like
> > > apache.  I ran update and dist-upgrade to bring the system up to date.
> > > When I ran the upgrade, it told me that it was trying to install an
> > > identical kernel image.
> >
> > Well, it was a new kernel image with the same version code so that the
> > new modules would be going in the same directory as the old modules.
> > The new modules won't work with the old kernel so that if you do
> > anything to trigger a module load, bad things can happen, which is why
> > you reboot as soon as the upgrade is complete.
> >
> > > It explained some things about what it was doing about modules, and
> > > then said to be sure to reboot.  I did that, but then it gave me that
> > > now-familiar message about how the filesystem has errors, hit
> > > control-D to continue...
> >
> > > I booted with Knoppix, made sure my filesystem on /dev/hda1 was NOT
> > > mounted, and ran fsck -f.  It did the 5 passes without mention of
> > > errors.  I ran it a second time with same results.  However, when I
> > > boot from /dev/hda1, I still get the error about a filesystem with
> > > errors!
> > >
> > > Trying to rebuild the server as it was is painful enough - Why would I
> > > be having these filesystem errors?  The HDD is relatively new.
> > >
> > > Any other way to try to get rid of the boot error before I reinstall
> > > etch again?  I hate to do that because I don't understand how these
> > > errors originate, so I don't know why I shouldn't expect them to crop
> > > up again at some point later after another fresh install.
> > >
> > > Why does the fsck during boot find errors when the fsck run via
> > > knoppix on the same filesystem return clean?
> >
> > Don't know why.  Here's how I'd proceed:
> >
> > 1.      boot with the kernel command line: init=/bin/sh since debian's
> > single-user mode gives you most filesystems already mounted.
> >
> > 2.      run fsck (read the man page to give you the options appropriate
> > to your root fs);  run it on all your filesystems.
> >
> > 3.      shutdown -h and power-cycle.
> >
> > 4.      run aptitude update then upgrade anything required.
> >
> > 5.      reboot.  Watch the screen for any errors on shutdown that would
> >        suggest that the system isn't, e.g. remounting the / fs ro
> >        before halt/reboot.  If in doubt, set up a serial console and
> >        log the output or set up the console output to go to a printer.
> >
> > 6.      If you still have problems, boot knoppix (I use grml) and run
> >        fsck.  If this is ext2/3, I'd run -c -c so that the entire disk
> >        gets read to force the drive firmware to re-map any bad sectors.
> >        While this is running, I'd be watching /var/log/syslog for any
> >        errors from the drive.
> >
> > 7.      Ensure that you have SMARTmontools installed and run a long
> >        smart test and when its complete, check the results on the
> >        drive.
> >
> > ---
> >
> > If all else fails, plan for a reinstall (ensure that you have backups).
> > Then boot knoppix and run wipe on the drive.  This fully exercises the
> > drive to exorsize any gremlins.  Then install etch minimal (don't select
> > any tasks), ensure that aptitude is installed (if it isn't, then apt-get
> > aptitude), get aptitude set up the way you like with only necessary
> > packages marked as manual, the rest as automatic, then do an update and
> > upgrade before you install any other packages.
> >
> > At each stage, do a shutdown -rF.
> >
> > Doug.
>
> Doug, thanks for the good ideas - I learned some things from your
> considered response.  I ended up finally reinstalling etch again.
> I've now captured the pertinent part of the boot messages and will
> copy below.  Why does the filesystem check clean once and then come up
> with errors the 2nd time?  You mentioned that I should fix it manually
> - Well, if I enter the root password at the prompt and try to run fsck
> manually, it warns me about the damage I might due to the MOUNTED
> filesystem.  I mentioned in my earlier post that if I boot into
> Knoppix and run fsck, it comes back CLEAN.  So I can't seem to repair
> it with Knoppix fsck, yet I get the error when I boot from my
> /dev/hda1 - the second time in the fsck sequence.  Can you shed any
> light on this?
>
> Here is the pertinent boot sequence:
>
> Checking root filesystem...fsck 1.40-WIP (14-Nov-2006)
> /dev/hda1: clean, 126072/19218432 files, 1420493/38409399 blocks
> done.
>
> Setting up system clock..
> Cleaning up ifupdown....
> Loading kernel modules...loop: loaded (max 8 devices)
> done.
>
> Loading device-mapper supportdevice-mapper: ioctl: 4.7.0-ioctl
> (2006-06-24) initialized: dm-devel@redhat.com
>
> Checking file systems...fsck 1.40-WIP (14-Nov-2006)
> / contains a file system with errors, check forced.
> /:
> Inodes that were part of a corrupted orphan linked list found.
> /: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> fsck died with exit status 4
>
> THANKS!  - John
>

Sorry to answer my own post, but it's FIXED!  When it got to the
"enter root password to enter maintenance", I did that, and at the
prompt entered fsck.  It warned me about running e2fsck on a mounted
filesystem, and I entered "n" and saw "no" echoed - However, then it
goes ahead and runs.  Will someone please explain that?  It seemed to
fix a million things (or at least a few hundred), but now it is
actually fixed!  Why does this work like this, and why didn't it work
running fsck from a live CD?  Thanks again!  - John


Reply to: