[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

LVM/grub2/UUID don't get along

After installing squeeze on a new disk, booting failed because the
kernel was unable to find the root device.   This is a report of soome
things that went wrong and some work-arounds.  My main motivation it to
highlight some areas that seem to need improvement.  I don't need help,
since I (think I've?) fixed the problem on my system.

Boot failed because the root device was on LVM, and LVM had not been
activated by the initrd scripts.  In particular, scripts/local-top/lvm2
does a number of tests to determine if root is on an LVM volume.  It is
expecting /dev/mapper/.... (and a few other cases), but not UUID=.....
So it never activate LVM volumes.

Since the autogenerated lines in grub.cfg have two root parameters, the
first as a conventional device and the second as UUID, the second
apparently is the one that counts.

The exact process by which I got to this state was definitely
non-standard, but it appears that this might be the regular behavior.  I
thought I'd report it here, since even if it's a bug it's not clear what
it is a bug in.  One interpretation is that the grub2 config scripts (or
perhaps the filesystem base scripts) should not generate the root=UUID
entries on the kernel invocation line. Alternately, the lvm initrd
scripts should cope with the UUID better.
Against the first interpretation are all the arguments for using UUIDS;
against the second is the difficulty (perhaps impossibity by the time
one is running in the initrd) of determing from the UUID if the device
is on lvm.  Perhaps a solution would be for the local-top/lvm2 script to
look for the UUID on the disks and, if unsuccessful, activate LVM with
vgchange -ay.

Additionally, the autogenerated grub.cfg did not even have the correct
UUID.  It had the UUID of the physical partition underlying the logical
volume of interest (or maybe of the boot partition), not the UUID of the
logical volume itself.  The VG includes several physical partitions, not
entire disks.

For the record, there were a couple of different **fixes/work-arounds**.
If one gets to the point inside the initrd where the kernel's root is
not found, one is dropped into a (busybox) shell.  Typing vgchange -ay
will activate the logical volumes; typing exit will cause the boot
process to continue successfully.

A more permanent solution is to edit the grub.cfg, or
ideally /etc/default/grub, which includes options not to generate the
UUIDs for the kernel root lines, and a custom set of options for the
main kernel startup.  E.g.,
GRUB_CMDLINE_LINUX="root=/dev/mapper/daisy-root_rescue ro rootdelay=20"
# autogenerated line was
#GRUB_CMDLINE_LINUX="root=UUID=2707f7ec-48cc-4c41-98ec-4dc5ee8bb8dd ro rootdelay=20"

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
I'm not sure the last uncommenting is wise.

Gory details about my non-standard process start here, and include some
additional issues that seem more likely to be idiosyncratic to the
non-standard installation.  Basically, I followed Appendix D3 of the
installation guide
http://www.debian.org/releases/stable/amd64/apds03.html.en (though I was
on i386).
Initial state: running Debian Lenny system, 686 architecture, lots of
disks making heavy use of lvm and token use of dm-crypt.  An internal
SATA disk is the boot disk; there is also an external data connected by
USB, an internal IDE connected to the main board, and 2 internal IDE's
connected to a promised controller.  Installed a new SATA disk (SATA 1,
where SATA 0 is the boot disk) and used (lenny) parted to make it GPT
with a BIOS boot partition.  Bootstrapped a squeeze system, chrooted
into it, and from inside the chroot installed a kernel and grub2.
Installed grub2 onto the new disk.  The new disk had a separate, vanilla
(for  GPT), partition to hold the squeeze /boot drive, and the new root
partition (including the whole rest of the system) came out of a logical
volume from the same VG that held much of the Lenny system.  Chain
loaded from my old grub1 into grub2 on the new disk.

This account omits many false starts.

The first cut of the grub setup was wildly off about the mapping from
grub disks to real disks.  Some of this seems inevitable.  In the
chroot, for example, /boot did not appear to be on a separate partition.
I think this is also the reason the initrd and vmlinuz symlinks ended up
in / instead of /boot.  Also, device naming and ordering differed in
Lenny (I did mount --bind /dev /squeezeroot/dev) from what it would be
in squeeze (hd? in Lenny becomes sd? with the squeeze kernel).

Even when I finally booted into squeeze, grub running inside linux did a
poor job of getting the device mapping right.  It assume that all my IDE
drives would appear before (in the grub (hd?) sense) all SATA drives
(ignoring the USB drive).  In fact, only the motherboard-connected IDE
drive appeared first.  I ended up editing the grub map by hand.

To recap the apparent problems, in estimated decreasing likelihood of
appearing for someone doing a more conventional installation:

1. grub.cfg specifies the kernel root paramater using UUID.  The LVM
scripts in the initrd fail to recognize such a parameter as requiring
LVM activation, as a result of which there is no root to mount and the
boot fails (absent the work-arounds above).  This assumes the use of
initrd, LVM, and root partition (i.e., logical volume) on LVM.

2. The UUID given is not the correct one.  But unless the VG is
activated (see point 1) the proper UUID would be invisible anyway.

3. When grub configuration scripts run inside the OS (even real squeeze,
not a chroot) they appear to guess that all IDE drives will appear
before all SATA drives in the grub boot environment.  In some cases,
e.g., mine, in which some IDE drives were connected to a supplementary
card, this assumption is false.

4. Running inside a chroot may confuse grub and other things (kernel
installation scripts deciding where to put symlinks) about where the
real partitions are.  Or perhaps I need to set a parameter to say that
installation is to /boot, not /; I dimly recall having done something
like that in Lenny with grub1.  In that case it's not chroot-specific,
but it is specific to having /boot on a separate partition.

Ross Boylan

Reply to: