2.6.16, udev and scsi devices
Hi list,
I have a Dell PE1850 system that I just dist-upgraded from woody to sarge.
I am having major problems understanding how to get a 2.6 kernel working
on this thing, mainly I think due to some strangeness of udev.
Background:
The system currently runs a custom-built 2.4.28 kernel.
I want a 2.6 kernel because users running jobs that need to address over
2Gb of memory are having problems.
The system boot disks are a hardware-mirrored pair of SCSI disks controlled
by a Dell PERC4e/DI raid controller (PCI ID 1028:0013 (rev 06)).
There are three other disk devices, RAID boxes connected with LSI/Symbios
FC929X cards. The boot device is /dev/sda, in /etc/fstab I have
/dev/sda2 / ext3 errors=remount-ro 0 1
/dev/sda3 none swap sw 0 0
/dev/sda5 /usr ext3 defaults 0 2
/dev/sda6 /var ext3 defaults 0 2
..etc...
lspci output of all the system devices is at the bottom of this note.
First, I tried the stock sarge 2.6.8-3-686-smp kernel. This fails to
pivot_root. This seems a common problem, see debian bugs #309123, #332663,
#336835 and http://lists.debian.org/debian-boot/2006/03/msg00680.html.
The root of that problem seems to be the megaraid driver from kernel.org's
2.6.8 was not working correctly. I'd appreciate any clarifications there.
Next, I tried backports.org's 2.6.16-1-686-smp. There were no backports
at all installed before this point. As Les Gray showed that I'd need to,
I upgraded to grub 0.97-12bpo1 and then installed the linux-image.
A quick summary of the relevant packages:
busybox-cvs-static 20040623-1
hotplug not installed
hotplug-utils 0.0.20020114-7
initramfs-tools 0.68bpo1
libklibc 1.4.11-2bpo1
linux-image 2.6.16-11bpo1
makedev 2.3.1-81bpo1
module-init-tools 3.2.2-2bpo1
udev 0.093-0bpo1
These were installed at the same time as the kernel (except for
hotplug & hotplug-utils).
The first time I booted this kernel, it detected the system boot disk
(yay) and proceeded to boot. Now the magic starts.
In /boot/grub/menu.list the relevant entry is
title Debian GNU/Linux, kernel 2.6.16-1-686-smp
root (hd0,1)
kernel /boot/vmlinuz-2.6.16-1-686-smp root=/dev/sda2 ro
initrd /boot/initrd.img-2.6.16-1-686-smp
savedefault
boot
This entry is correct, see the partition tables below.
Grub read it, and did the right thing with it.
The system then proceeds to get in a real tangle, and cannot find
the root partition. It drops me into a busybox shell from where I
can do a few fdisk -l commands:
# fdisk -l /dev/sda
Disk /dev/sda: 1505.9 GB, 1505973239808 bytes
64 heads, 32 sectors/track, 1436208 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 1436208 1470676976 83 Linux
# fdisk -l /dev/sdd
Disk /dev/sdd: 73.2 GB, 73274490880 bytes
255 heads, 63 sectors/track, 8908 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 1 5 40131 de Dell Utility
/dev/sdd2 6 67 498015 83 Linux
/dev/sdd3 68 1283 9767520 82 Linux
swap / Solaris
/dev/sdd4 1284 8908 61247812+ f W95 Ext'd (LBA)
/dev/sdd5 1284 1648 2931831 83 Linux
/dev/sdd6 1649 1891 1951866 83 Linux
/dev/sdd7 1892 2134 1951866 83 Linux
/dev/sdd8 2135 3350 9767488+ 83 Linux
/dev/sdd9 3351 8908 44644603+ 83 Linux
It appears that the disk that grub saw as /dev/sda has been renamed,
and is now named /dev/sdd. And /dev/sdd is now /dev/sda.
From this I'm guessing that udev has decided, "no grub, you've got
it all wrong, the numbering order is _this_ way" and changed it,
in mid-boot. Fascinating.
If I unplug all the FC disks, leaving just the internal SCSI disk,
I get a normal boot from 2.6.16; everything else seems fine and dandy.
The boot log shows the megaraid module (which talks to the Perc4e/DI
controller) is loaded before the mptscsi* modules (that talk to the
FC929 cards). So the detection order seems fixed, and okay.
Then I halt, cold-plug the FC disks in, boot, and I get the same
reordering behaviour, sda<->sdd. I didn't check for sdb<->sdc.
Questions:
Can anyone advise on how do make udev stop doing this, it's hurting.
Or is there some other weirdness associated with 2.6.16, or busybox,
that is causing the problem, not udev at all?
Cheers
Vince
PS: (lspci; lspci -n) | sort -n
0000:00:00.0 0600: 8086:3590 (rev 09)
0000:00:00.0 Host bridge: Intel Corp. Server Memory Controller Hub (rev
09)
0000:00:02.0 0604: 8086:3595 (rev 09)
0000:00:02.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express
Port A0 (rev 09)
0000:00:04.0 0604: 8086:3597 (rev 09)
0000:00:04.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express
Port B0 (rev 09)
0000:00:05.0 0604: 8086:3598 (rev 09)
0000:00:05.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express
Port B1 (rev 09)
0000:00:06.0 0604: 8086:3599 (rev 09)
0000:00:06.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express
Port C0 (rev 09)
0000:00:1d.0 0c03: 8086:24d2 (rev 02)
0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI
#1 (rev 02)
0000:00:1d.1 0c03: 8086:24d4 (rev 02)
0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI
#2 (rev 02)
0000:00:1d.2 0c03: 8086:24d7 (rev 02)
0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI
#3 (rev 02)
0000:00:1d.7 0c03: 8086:24dd (rev 02)
0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
0000:00:1e.0 0604: 8086:244e (rev c2)
0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
0000:00:1f.0 0601: 8086:24d0 (rev 02)
0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge
(rev 02)
0000:00:1f.1 0101: 8086:24db (rev 02)
0000:00:1f.1 IDE interface: Intel Corp. 82801EB/ER (ICH5/ICH5R) Ultra ATA
100 Storage Controller (rev 02)
0000:01:00.0 0604: 8086:0330 (rev 06)
0000:01:00.0 PCI bridge: Intel Corp. 80332 [Dobson] I/O processor (rev 06)
0000:01:00.2 0604: 8086:0332 (rev 06)
0000:01:00.2 PCI bridge: Intel Corp. 80332 [Dobson] I/O processor (rev 06)
0000:02:0c.0 0c04: 1000:0626
0000:02:0c.0 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel
Adapter
0000:02:0c.1 0c04: 1000:0626
0000:02:0c.1 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel
Adapter
0000:02:0e.0 0104: 1028:0013 (rev 06)
0000:02:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID
controller 4 (rev 06)
0000:03:0b.0 0c04: 1000:0626
0000:03:0b.0 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel
Adapter
0000:03:0b.1 0c04: 1000:0626
0000:03:0b.1 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel
Adapter
0000:05:00.0 0604: 8086:0329 (rev 09)
0000:05:00.0 PCI bridge: Intel Corp. PCI Bridge Hub A (rev 09)
0000:05:00.2 0604: 8086:032a (rev 09)
0000:05:00.2 PCI bridge: Intel Corp. PCI Bridge Hub B (rev 09)
0000:06:07.0 0200: 8086:1076 (rev 05)
0000:06:07.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet
Controller (rev 05)
0000:07:08.0 0200: 8086:1076 (rev 05)
0000:07:08.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet
Controller (rev 05)
0000:09:05.0 ff00: 1028:0011
0000:09:05.0 ff00: Dell Remote Access Card 4 Daughter Card
0000:09:05.1 ff00: 1028:0012
0000:09:05.1 ff00: Dell Remote Access Card 4 Daughter Card Virtual UART
0000:09:05.2 ff00: 1028:0014
0000:09:05.2 ff00: Dell Remote Access Card 4 Daughter Card SMIC interface
0000:09:06.0 0101: 1095:0680 (rev 02)
0000:09:06.0 IDE interface: Silicon Image, Inc. (formerly CMD Technology
Inc) PCI0680 Ultra ATA-133 Host Controller (rev 02)
0000:09:0d.0 0300: 1002:5159
0000:09:0d.0 VGA compatible controller: ATI Technologies Inc Radeon RV100
QY [Radeon 7000/VE]
Reply to: