[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Could do with some help - Wheezy, Kernel updated, now cannot boot



Ron Leach wrote:
> Some progress.  The CDROM problem that was preventing the D7.7 install DVD
> from running in rescue mode was due to a defective USB DVD device.
> Replacing that enables the rescue mode of the install DVD to run properly.

Yay!  Progress!

> Rescue mode has assembled 7 raid partitions, and '/' was on the 2nd
> partition (/dev/md122).  /dev/md121 - which I first tried - must be the boot
> partition that LILO is using (and I seem to recall that when I built the
> original Lenny RAID system the installer preferred to use LILO as the boot
> system, though I notice that recent Wheezy machine builds are using GRUB).

Depending upon how you partitioned your system one or the other may
have been required.  For example if you /boot on an lvm partition then
grub did not know how to talk to it and lilo was required.  Also other
possibilities.

Regardless of newer versions of grub and lilo and others working with
more complex /boot locations, such as on lvm, or encrypted partitions,
and so forth, I strongly prefer /boot on a plain raw partition.  It is
okay to have /boot be an /dev/md0 multi-device raid partition managed
by mdadm.  That works wonderfully because at boot time it appears to
be a plain partition.

> I now have a rescue shell on '/' where '/' has been mounted at /dev/md122.
> ls gives me the 1st level filesystem directory names that I expect to see.
> I tried running mc so that could copy '/' to some external device, but it
> couldn't be found; I think that may in part be because the installer has
> only mounted the '/' partition and not the other various partitions.  I'll
> next look at fstab, and mount the remainder of the filesystem.

Normally the idea in rescue mode is that you are presented with a
shell with the root in your main system.  At that point you can mount
the rest of your system.  It would normally be like this:

  # mount -a

However seeing those large numbers 121 and 121 in your device paths
/dev/md121 above I worry that you had /dev/md0 and /dev/md1 types of
names and those have been swapped for /dev/md121 and /dev/md122 and
therefore the paths in your fstab won't work at the moment.  Or they
might be using other paths, labels, uuids, and so forth.

If you could post the contents of your /etc/fstab it would help us
know how you have configured your system.  There are so many
possibilities.  It is hard to put forward suggestions without knowing.

And what was the structure of your configuration?  Perhaps we can tell
from the /etc/fstab lines?  Lots of possibilities.

> I'm still uncertain how to start the Wheezy system that is just sitting
> there.  I'd like to be able to do that so as to check quickly that
> everything that should run is running ok - unless that's a bad thing to do
> at this stage.

As far as I can read from your threads you currently have two partitions.

  /dev/md121 /boot
  /dev/md122 /

Is that all?  Swap?  Other?  If I was at that point I would probably
go ahead and mount everything.  But if the device numbers have changed
then they could only be mounted manually because they won't match what
is in the /etc/fstab.

I don't see where you have said what is in /etc/fstab and therefore
don't know how many file systems need to be mounted.  But for those
two it would be simply this:

  # mount /dev/md121 /boot

At this point you are effectively in the early bootstrap stages of a
system.  You have a shell.  File systems are mounted.  But nothing
else related to a normal system bootstrap may have happened.  The
idea at this point isn't to be able to have a fully normal system.
The idea at this point is to be able to repair your system and reboot
again so that it will be normal.

It has been a long time since I have used lilo so I forget the
commands to rebuild the lilo boot.  But you should be able to run
commands at this point and fix things.

  # lilo-stuff-here  # Sorry I forget how to fix lilo systems now...

If it were a grub based system it would be grub commands.  This would
be typical.

  # grub-install /dev/sda

And then having fixed grub then reboot and the system would come up
normally.

The change in device numbers may be permanent though and if the device
names don't match up with what you have in /etc/fstab then things
still won't be happy.  I have been there many times.  Here is how you
can reset the device numbers back to what they were.

Exit the shell on your raid device.  Start a shell in the installer
environment in the initrd.  Then use mdadm commands to stop the raid
and reset the numbers.

To stop mdadm on a device:

  mdadm --stop /dev/md121
  mdadm --stop /dev/md122

Minor number will be out of sync, fix with:

  mdadm --assemble /dev/md0 --update=super-minor /dev/sda1
  mdadm --assemble /dev/md1 --update=super-minor /dev/sda5

Adjust the device numbers all around above for what matches your
setup.  The above would be typical for the basic installs I have seen
however and I think likely to be correct for you.  But double check.

You can also query your raid devices.  Use --detail on the built
device.  Use --examine on the raw device.

  mdadm --detail /dev/md0
  mdadm --examine /dev/sda1

Among other things it will show the "Preferred Minor" number that I
reset above.  It will also how the UUID.

> I'm also uncertain how to find out what has gone wrong with the boot
> system and how to rebuild it, including whether to replace LILO with
> GRUB.

I can't tell what went wrong either.  But if in trying to fix it, such
as trying to set lilo up again, you see errors then those errors would
probably be what went wrong.

I can't recommend anything on whether you should stick with lilo or
move to grub.  If it were the old grub1 then I would have strongly
recommended grub because the boot time flexibility was quite nice.
But with the current new grub2 complete rewrite that capability is now
quite a bit more tedious.  I can't really recommend one over the other
anymore.

> First, though, I'm going to mount the file systems and try to image things.

At your main system root prompt if networking is not online you should
be able to bring networking up and online with:

  # ifup eth0

Check /etc/network/interfaces for how you have it configured.  After
networking is up then you should be able to have the full capabilities
of your rescued system to copy files off.  You should also be able to
apt-get install new packages.  You will be at the system console and
in control and able to make administrative changes.  But I wouldn't
try to bring up X windows, graphical desktops, nor any of that type of
thing.  Simply use the system console to repair and then reboot.

> Suggestions are more than welcome,

Hope that helps!
Bob

Attachment: signature.asc
Description: Digital signature


Reply to: