[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: fsck'd



On 3/7/08, Douglas A. Tutty <dtutty@porchlight.ca> wrote:
> On Fri, Mar 07, 2008 at 08:29:08AM -0500, John Fleming wrote:
> > Backgroud - I had a well-established LAMP server that was giving some
> > filesystem errors on boot, with the "hit control-D to continue or give root
> > password to fix manually" message. It would go ahead and work normally if I
> > hit Control-D.
>
> You should have fixed it manually.  Control-D is in case the person
> sitting there doesn't have the root password and is generic to Debian's
> single-user mode.
>
> > However, I wanted to try to get rid of the error and need for human
> > intervention in the event of the need for a remote reboot, so I tried
> > to fix the errors with fsck.  Somehow I ended up with a badly trashed
> > filesystem and inability to reboot.
> >
> > After much knashing of teeth and consideration of my options, I
> > installed a new etch system.  I installed several benign packages like
> > apache.  I ran update and dist-upgrade to bring the system up to date.
> > When I ran the upgrade, it told me that it was trying to install an
> > identical kernel image.
>
> Well, it was a new kernel image with the same version code so that the
> new modules would be going in the same directory as the old modules.
> The new modules won't work with the old kernel so that if you do
> anything to trigger a module load, bad things can happen, which is why
> you reboot as soon as the upgrade is complete.
>
> > It explained some things about what it was doing about modules, and
> > then said to be sure to reboot.  I did that, but then it gave me that
> > now-familiar message about how the filesystem has errors, hit
> > control-D to continue...
>
> > I booted with Knoppix, made sure my filesystem on /dev/hda1 was NOT
> > mounted, and ran fsck -f.  It did the 5 passes without mention of
> > errors.  I ran it a second time with same results.  However, when I
> > boot from /dev/hda1, I still get the error about a filesystem with
> > errors!
> >
> > Trying to rebuild the server as it was is painful enough - Why would I
> > be having these filesystem errors?  The HDD is relatively new.
> >
> > Any other way to try to get rid of the boot error before I reinstall
> > etch again?  I hate to do that because I don't understand how these
> > errors originate, so I don't know why I shouldn't expect them to crop
> > up again at some point later after another fresh install.
> >
> > Why does the fsck during boot find errors when the fsck run via
> > knoppix on the same filesystem return clean?
>
> Don't know why.  Here's how I'd proceed:
>
> 1.      boot with the kernel command line: init=/bin/sh since debian's
> single-user mode gives you most filesystems already mounted.
>
> 2.      run fsck (read the man page to give you the options appropriate
> to your root fs);  run it on all your filesystems.
>
> 3.      shutdown -h and power-cycle.
>
> 4.      run aptitude update then upgrade anything required.
>
> 5.      reboot.  Watch the screen for any errors on shutdown that would
>        suggest that the system isn't, e.g. remounting the / fs ro
>        before halt/reboot.  If in doubt, set up a serial console and
>        log the output or set up the console output to go to a printer.
>
> 6.      If you still have problems, boot knoppix (I use grml) and run
>        fsck.  If this is ext2/3, I'd run -c -c so that the entire disk
>        gets read to force the drive firmware to re-map any bad sectors.
>        While this is running, I'd be watching /var/log/syslog for any
>        errors from the drive.
>
> 7.      Ensure that you have SMARTmontools installed and run a long
>        smart test and when its complete, check the results on the
>        drive.
>
> ---
>
> If all else fails, plan for a reinstall (ensure that you have backups).
> Then boot knoppix and run wipe on the drive.  This fully exercises the
> drive to exorsize any gremlins.  Then install etch minimal (don't select
> any tasks), ensure that aptitude is installed (if it isn't, then apt-get
> aptitude), get aptitude set up the way you like with only necessary
> packages marked as manual, the rest as automatic, then do an update and
> upgrade before you install any other packages.
>
> At each stage, do a shutdown -rF.
>
> Doug.

Doug, thanks for the good ideas - I learned some things from your
considered response.  I ended up finally reinstalling etch again.
I've now captured the pertinent part of the boot messages and will
copy below.  Why does the filesystem check clean once and then come up
with errors the 2nd time?  You mentioned that I should fix it manually
- Well, if I enter the root password at the prompt and try to run fsck
manually, it warns me about the damage I might due to the MOUNTED
filesystem.  I mentioned in my earlier post that if I boot into
Knoppix and run fsck, it comes back CLEAN.  So I can't seem to repair
it with Knoppix fsck, yet I get the error when I boot from my
/dev/hda1 - the second time in the fsck sequence.  Can you shed any
light on this?

Here is the pertinent boot sequence:

Checking root filesystem...fsck 1.40-WIP (14-Nov-2006)
/dev/hda1: clean, 126072/19218432 files, 1420493/38409399 blocks
done.

Setting up system clock..
Cleaning up ifupdown....
Loading kernel modules...loop: loaded (max 8 devices)
done.

Loading device-mapper supportdevice-mapper: ioctl: 4.7.0-ioctl
(2006-06-24) initialized: dm-devel@redhat.com

Checking file systems...fsck 1.40-WIP (14-Nov-2006)
/ contains a file system with errors, check forced.
/:
Inodes that were part of a corrupted orphan linked list found.
/: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
fsck died with exit status 4

THANKS!  - John


Reply to: