[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: HELP!!! trying to recover crashed system



Miles Fidelman a écrit :
> Hi Folks,
> 
> Here it is, Friday, and my birthday to boot, with one fire on my desk
> already, when I discover that a critical server has crashed....
> 
> The server is running Sarge (I know, I was just about to upgrade, but if
> it ain't broke, why fix it), that just crashed this morning, and I'm
> having a horrible time recovering.  Any help anyone can offer would be
> very much appreciated.
> 
> The basic configuration:
> - i686 motherboard, Pentium chip
> - 2 SATA channels, 2 drives on each (total of 4)
> - 4 partitions on each drive
> - 4 md devices are built across the four drives (for each - 3 hot
> drives, 1 spare)
> - two md devices are used for boot and swap
> - the other two md devices have logical volumes on top of them (LVM) -
> used for / and /backup (large archive)
> - all MBRs set up to boot
> 
> The failure:
> - looks like one of two SCSI interfaces has died, taking down the two
> attached drives
> -- the system should keep running, but doesn't, and won't come up
> --- it gets pretty far in the boot process, then starts throwing errors
> "devfs_mk_dir invalid argument, could not append to parent for /disc"
> and freezes
> - if I boot from a live CD, I get errors from the ATA driver (IO error,
> and so forth) - very obviously hardware errors
> 
> Luckily, I have an identical box avaiable.  So... I simply moved the
> four disk drives from the failed machine, to the new one.  Silly me, I
> figured it would just come up, the RAIDs would repair themselves, and
> I'd be back on the air.  Instead:
> 
> - I get the same devfs_mk_dir error (but if I boot from a live CD, I
> DON'T get any hardware errors)
> -- suggests that one of the drives is so badly corrupted that the RAID
> can't rebuild
> --- when I try looking at the disks (start up the Debian installer, go
> into the partitioner), the partitioner freezes halfway through scanning
> the drives
> --- a little experimentation (pulling different drives) gets me to the
> point where the partitioner will start, and sees the various partitions
> ----- of course, at this point, I abort - I don't want to trash any of
> the data
> - with the bad drive pulled, I try to boot, but all I get is a "boot
> from CD" prompt
> 
> Where this leaves me:
> - I don't want to trash the system (or the user data) on the drives, if
> I can avoid it (obviously)
> - I need to recover sufficiently to boot
> - from there I'd like to try to rebuild the RAID devices and logical
> volumes and see where I am
> - I'm guessing that something very basic has been trashed - like the
> MBR, or grub configuration
> 
> So.... any suggestions would be very much appreciated as to:
> 
> 1. rescue tools - particularly something that lets me try to mount the
> existing md devices and LVMs, and then boot
> 2. generally restoring the system to a bootable state (mbr, grub, etc.)
> 3. thoughts on examining the one drive that might or might not be bad
> -- diagnostic
> -- if good: recovery or reformatting so I can add it back to the
> RAID/LVM pool
> -- if bad: how to configure a spare drive to stick it into the existing
> RAID/LVM pool
> 
> Thanks VERY much.
> Miles Fidelman
> 
> 
> 
> 
Hi,

since your configuration is based on Grub "legacy", "Super Grub Disk"
should enable you to restore the grub configuration.

http://www.supergrubdisk.org/

Then SysrescueCD comes with all fanciness regarding system recuperation,
including of course raid and lvm tools. You can choose at boot time
between two kernels for i386 or and two for amd64, and hardware
recognition is usually very good.

http://www.sysresccd.org/Page_Principale

For hard drive low-level tools, there's nothing like UBCD, the website
layout is pretty bad, but the tool is good.

http://www.ultimatebootcd.com/

Hope it helps.

Tom


Reply to: