[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: strange booting issues after hd-crash



On Sat, Nov 18, 2006 at 09:49:29AM +0100, 803026@linzag.net wrote:
[edited]
> ... how everything started: I re-installed GRUB, but i happend to type
> "/dev/hda1" instead of "/dev/hda". killed the boot sector of my Win2K
> partition. So i tried to rescue it with my Windows-CD which ended up
> by screwing the whole MBR and losing the partition table.  "gpart",
> which arrived to guess my old partition table and helped me to rescue
> my linux and data.

> Now since then, my Etch is starting to have some strange issues at
> boot time. After the line "Setting up ICE socket directory (...)" it
> wants the root passwort ("for maintenance" - what am i expected to
> maintain?). Next thing is "Starting MTA:" where it takes a long, long
> time and often hangs completely. (I tried "dpkg-reconfigure exim4" but
> it didn't help). And once, after all that, it even wouldn't start X
> (but this happend only once)
> 
> The whole thing being happening on a irregular base is causing me some
> headache, because (to me) it smells a bit like (physical?)
> harddisk-troubles. Is it possible that the Linux-live-CD, whitch i
> booted to run gpart, used some space of my HD as swap and overwrote
> important data? But fsck reports no errors...  Anybody ever
> experienced something like that? 

Questions and ideas:

What was it that happend that prompted you to reinstall grub?

What sort of partition layout?  raid? lvm?  What types of file systems?

gpart may have guessed slightly off.

There may be a physical problem.

Ensure that your Etch is up-to-date.

Are there any drive errors popping up in your logs? E.g. data needs to
be read or written and the kernel has to try a couple of times to get
the drive to work.

Have you done a manual fsck or just the boot-up preening?  With
non-destructive read-write testing?  Does this produce any errors?  Do
this from single-user (or even better, init=/bin/sh) from the console
and watch for any kernel messages posted there.

It sounds like you've been able to retreive the data.  Back this up,
then decide which is more profitable to you:

	poke away at the current situation and try to get back a solid
	reliable system.

	Or

	Focus on verifying that the drive itself is OK by:
		Getting new install media (more current Etch)
		reboot from the install media, go to the console,
		and run
			badblocks -vw /dev/hda

		This has the added benefit of totally erasing your
		partition table (and everything else).

	Then, if the drive really is OK, then either reinstall as it was
	or just put windows on it and buy a new drive for linux.

My personal feeling is that with something as foundational as a hard
drive, if there's any suspicion of unreliability that can't seem to be
fixed, you're better to reinstall on a reliable drive.  Save the
questionable drive for non-system use.  Even for home with good backups
if need be, at least the system will run.  Drives are cheap: buy two and
use raid1.

Once you get a working linux system, as part of your backup strategy,
keep a copy of the partition table:

	sfdisk -d /dev/hda > hda.out

which can be restored with

	sfdisk /dev/hda < hda.out

These suggestions may be off base depending on the answers to the
questions I posed.

Good luck.

Doug.



Reply to: