[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [ squeeze ] Grub2 RAID1 LVM2 boot failure



d.sastre.medina@gmail.com wrote:

> On Sat, May 29, 2010 at 05:44:22PM -0400, Tom H wrote:
>> On Sat, May 29, 2010 at 7:06 AM, David Sastre Medina
>> <d.sastre.medina@gmail.com> wrote:
>> >
>> > Grub2 is failing to boot a softRAID1 + LVM2 squeeze box.

I use an equivalent setup and it was all automatically setup
correctly with the `update-grub2` command (once the system has
booted correctly).

Keep reading.

I have an 'md1' as '/boot' and an lvm2 '/' on 'md2', this is what my system uses:

grub.cfg:
    menuentry "Debian GNU/Linux, with Linux 2.6.32-trunk-amd64" --class debian --class gnu-linux --class gnu --class os {
        insmod raid
        insmod mdraid
        insmod ext2
        set root='(md1)'
        search --no-floppy --fs-uuid --set 1be9c4e5-70cd-4662-81e6-44e76cff20d8
        echo    Loading Linux 2.6.32-trunk-amd64 ...
        linux   /vmlinuz-2.6.32-trunk-amd64 root=UUID=25defa7a-93cb-40eb-9a76-c326f0b2dffc ro  vga=792
        echo    Loading initial ramdisk ...
        initrd  /initrd.img-2.6.32-trunk-amd64

blkid: `blkid /dev/md[1,2]` Use blkid -g first to clear any old stored key.
    /dev/md1: UUID="1be9c4e5-70cd-4662-81e6-44e76cff20d8" TYPE="ext2" 
    /dev/md2: UUID="25defa7a-93cb-40eb-9a76-c326f0b2dffc" TYPE="ext2"

grub-probe: grub-probe -t fs_uuid /boot, grub-probe -t fs_uuid /
    1be9c4e5-70cd-4662-81e6-44e76cff20d8
    25defa7a-93cb-40eb-9a76-c326f0b2dffc

mdadm: `sudo mdadm -D /dev/md[1,2] | grep UUID`
    UUID : ff7e23a3:dc6327b6:73d158fc:63c6b3dd
    UUID : 157b664b:7b41974f:73d158fc:63c6b3dd

It's booting fine all the time.

>> > root@sysresccd /root % mdadm --detail /dev/md0 /dev/md0:
>>
>> > UUID : 8052f7d4:54a97fbb:731031f6:bc3d041c

That UUID it's not the same that grub will use for boot.

>> I see two possible problems when looking at your grub.cfg.
>> 
>> 1. There isn't an "insmod lvm" within the menuentry stanza. ext2,
>> raid, and mdraid are insmod'd twice in the header and once in the
>> menuentry and lvm is inmod'd just once in the header. (This is one of
>> the grub2 mysteries; why multiple insmods of the same modules?). I
>> doubt that this is the source of the problem (the first insmod must be
>> enough!) but you could add "insmod lvm" within the menuentry.
> 
> Already tried that. No success.

That is not your problem IMO.

>> 2. In the uuid of the search line, what is
>> 785366b0-d597-4e9c-9284-b6b9161236ed? One of your /dev/sX1's uuid?
>> Since raid and mdraid are loaded, can't you/shouldn't you use the md0
>> uuid above?

> I also tried that. It fails. 
> That UUID belongs to /root_vg-root_lv, where the root filesystem
> resides.
> The UUID can be confirmed at the grub propmt issuing
> grub> ls (root_vg-root_ls)

No, the `root` partition from the point of view of grub is the partition
where it is going to boot, i.e. /boot, then, the kernel will need the
`root` FS to use, that will be the UUID for /root_vg-root_lv in the `linux`
line.

> Note that `boot' is a multidisk partition (sda1 and sdb1, which assemble
> md0), thus root='(md0)' makes sense from a grub point of view.

Correct.

> And md1 is the result of assembling sda2 and sdb2. This md device has only
> one VG on top of it, root_vg, with several LVs in it, one of these LVs
> being my root_lv.

That looks OK.
 
> This my default menuentry now:
> menuentry "Debian GNU/Linux, with Linux 2.6.32-3-686-bigmem" --class debian --class gnu-linux --class gnu --class os {
>         insmod raid
>         insmod mdraid
>         insmod lvm
>         insmod ext2
>         set root='(md0)'
>         search --no-floppy --fs-uuid --set 785366b0-d597-4e9c-9284-b6b9161236ed
>         echo    Loading Linux 2.6.32-3-686-bigmem ...
>         linux   /vmlinuz-2.6.32-3-686-bigmem root=/dev/mapper/root_vg-root_lv ro rootdelay=15 quiet
>         echo    Loading initial ramdisk ...
>         initrd  /initrd.img-2.6.32-3-686-bigmem
> }
 
> The `set root' entry says what is *root* for grub, I understand this as:
> where are /boot/grub/grub.cfg, /vmlinuz-`uname -r` and /initrd.img-`uname
> -r` So IMHO it should be called boot='(md0)' for better undestanding and
> disambiguation from the *other* root in the `linux' line.

Yes, that's exactly it.

> The GRUB root device is not the same as the Linux kernel root= parameter.
> BTW this command is undocummented in the wiki, still uses grub-legacy's
> info, which doesn't apply anymore, given the `root' command has been
> replaced.

But grub has nothing to do with this parameter, it is a kernel `boot parameter`
well, more of a initrd boot parameter, but that is a different area:
    http://www.mjmwired.net/kernel/Documentation/kernel-parameters.txt
    line 2193.
 
> The `search' line, as stated in the grub wiki:
 
> Search devices by file, filesystem label or filesystem UUID. If --set
> is specified, the first device found is set to a variable. If HD
> variable name is specified, "root" is used.

I believe there is a mistake, and, that the `HD` should be `NO`. Meaning
that if no variable name is supplied, the value is assigned to the `root` variable.

This effectively repeats what the previous command did, IMO.
 
> I take this to mean that the first device found _which UUID is_ 785...
> (the UUID of my root_gv-root_lv) will be the `root' filesystem.

Well, the root for grub, not the root for the kernel.
 
> And yet another definition of `root' after the `linux' call.
> That one states that:

> root=/dev/mapper/root_vg-root_lv  which could be written also as:
> root=LABEL=root  or even
> root=UUID=785366b0-d597-4e9c-9284-b6b9161236ed

Yes, all are correct and I strongly recommend to use the UUID value from the blkid command.

Warning: The command blkid needs a `blkid -g` first to clear the stored UUIDs in it's cache.

> The three of them should be right. None of them work.

Your problem seems to be that the KERNEL can't find the root
FileSystem, nothing that grub could do to solve it.
 
> If a suppress the `quiet' option from the `linux' line, what I can see
> is LVM initializing *before* mdadm has get its job done:
 
> "Volume group "root_vg-root_lv not found
>  Skipping volume group root_vg
>  Unable to find LVM volume root_vg-swap_lv
>  mdadm:/dev/md0 has been started with two drives
>  mdadm:/dev/md1 has been started with two drives
>  Gave up waiting fot root device."

That confirms it, it's a kernel problem not finding the correct `root` filesystem.
Use blkid UUID on that line.

> So it looks like a timming issue *but*, I have tried to issue manually
> the commands in the right order at the grub prompt:
> 1) insmod-ing raid, mdraid, lvm and ext2; setting root to md0;
> 2) searching for devices (also a variant without this step);
> 3) calling linux with the right root device
>  (all three variants of this step: dev name, UUID and LABEL and with
>  different rootdelay timmings, always without `quiet') and, finally;
> 4) calling initrd.
 
> Failure again. No way root_vg to be found.

Once you have booted into this system, `update-grub` should set
this file correctly, grub.cfg will be updated on any kernel change.

Make sure `update-grub` is correctly creating a good grub.cfg before a re-boot.

> One further question: after a reboot, while at the grub screen, before
> doing anything else, if a enter the command line and type `ls' at the
> prompt, I can see all of my LVs, and listing anyone of them returns:
> device name, filesystem type, label, last modification time and UUID.
> Where does this info come from? Supossedly, there aren't mods loaded to
> read that yet, until after `insmod' loads them, are there?

That's the 'core.img' code for grub, which needs to correctly read
all UUIDs to really perform it's job correctly.


-- 
Antonio Perez


Reply to: