[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

2.4 (devfs) -> 2.6 (udev & !devfs) with raid+lvm on /root transition problems



No bug for this, mostly just getting this info out there somewhere in
the hopes that if someone else is trying to do this it might be of some
help.

Configuration:

2.4.27 standard debian kernel with devfs=mount

/dev/md/0 == swap
/dev/md/1 == /boot
/dev/md/6 + /dev/md/7 == shaktivg
/dev/shaktivg/rootlv == /root

The MDs are raid1.

mkinitrd creats a flawless initrd for the 2.4.x kernel (big step up from
some time ago where I had to drop in scripts for setting up lvm).

2.6.9 standard debian kernel with devfs=mount

I didn't spend a lot of time with this setup (devfs is deprecated and
udev seems like the right way to go).  Interesting that in order for lvm
to work we need devfs compiled in and we have to mount devfs
temporarily.  Same for md it seems (to get to the md device for mdadm).
 So, even though *I'm* not using devfs the system is.

I probably should have just left things alone with devfs=mount.

2.6.9 standard kernel with no devfs=mount.

This is where things started going downhill.

1) I've always used devfs on this system.  So, /dev (the underlying dev
directory) was completely empty.  This resulted in all kinds of
mischief.  For instance, /dev/console was not in the initrd.  So, in
init where /dev/console is used I'd get an error that dev/console is
read-only (shell can't create the file dev/console for redirection) and
so it fails.  I also lacked /dev/hd* and /dev/md*.

2) LVM isn't smart enuf to ignore RAID1 components.  This means that
unless you filter out the underlying devices (/dev/hd*) it will grab the
first partition that looks like an LVM component and then complain about
 duplication UUIDs for the others.  I didn't see a rational order to
which components it scans but it always saw the /dev/hd* ones before the
/dev/md* ones).

3) Creating the mkinitrd under 2.4.x resulted in trying to assemble the
raid using devfs device names.  You'd think this would work fine since
the device nodes are copied over in the initrd too.  For some reason it
wasn't for me.

After starting, mkinitrd under 2.6 would create a initrd without the
raid because lvm had bypassed the raid device and used the underlying
components.  So, the initrd would be created with no madadm stuff.  "oops".

4) LVM seemed to be consistent in which "side" of the mirror it chose,
so I booted into a CD that supported RAID+LVM and failed the drives that
LVM chose.

5) Continued trying to make LVM work. After much playing (I had the
syntax wrong for the filter) I finally got it to ignore all but
/dev/md?.  Then make a new mkinitrd, expanded the initrd, manually added
the raid stuff, rebooted/edited a few times to fix stupid mistakes,
realized that lvm suddenly switched which drives it was stealing for
itself <sigh> so my saved mkinitrd work area kept switching from one
version to another, but finally got a working initrd.

6) Added the failed drives back into the mirror and resynched.


So, what prompted all this?

Needed to hook up an aditional fax modem.  Purchased a USB<-->serial
adapter (prolifics 2303 based).  Could not get it to work on 2.4.x.
Thought to try it on 2.6.

Ran out of space in root due to bloat in /lib/modules/*.  I resized the
root LV which was at 100%.  /etc/lvm/* is on root (/) and so a backup of
the VG was not able to be created.  No biggie I thought, never had
problems.  My UPS then decided to take a dump (turned out to be a batt
problem).  Hooked up the new UPS (already had the new one, just no
scheduled downtime to hook it up), and tried to boot.  Unable to mount
root.  Boot from recovery CD and inspect.  XFS knows root is supposed to
be 250M (new size) but LVM is showing it as only 150M (old size).  Since
the VG backup is in root I can't restore the metadata (I now keep
another copy in var and will probably also keep a copy in /boot (no LVM
dependancy there).

I had a level 0 backup by amanda of root from a day before, so I did a
manual restore of that from a boot CD that supports xfs+lvm.  I had to
reinstall some of the packages I put in for 2.6 support (udev, hotplug,
etc).

The rest of the saga is above.

I'm not sure there is much that the debian packages can do to do a
better job in my case.  The single biggest problems I ran into were:

1) No real dev when not running devfs which caused all kinds of problems
especially for the initrd.

2) lvm stealing the underlying components of the RAID1 device.  This is
really scary because it 'works' but will result in corruption.  I had to
xfs_repair my filesystems -- luckily I made no real changes to the FSes
while in this state so it didn't cause me too much problems.  This is
fairly silent though and the system appears to work.  Only because I was
having other problems did I notice this problem and spend time/effort
addressing it. The only real indication of a problem (other than
symptoms down the road) were the duplicate UUID warnings from the lvm tools.

-- 
 -Rupa

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


Reply to: