[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#273182: A fix, and diagnosis of some udev issues.



G'day,

I hit this when migrating to udev on a system with SATA HDD's used for a
md0 RAID0 filesystem mounted as /var/local (non-root).

IMHO the udev excuse of "make the kernel mantainers compile md into the
kernel as a workaround" is a very bad idea. The problem is definitely an
interaction issue with the kernel, udev, and mdadm. They are equally to
blame, so the fix/workaround will require them to co-operate on it.

Here are the things required to make it work on my system;

1) Add the SATA module specific to my hardware (sata_sil)
to /etc/modules.

2) Add the following to /etc/udev/links.conf;

  # create md devices.
  M md0           b 9 0
  M md1           b 9 1
  M md2           b 9 2
  M md3           b 9 3

3) modify /etc/init.d/mdadm-raid as follows;

--- pirli.3/init.d/mdadm-raid Fri, 10 Dec 2004 22:41:39 +1100 
+++ pirli.5(w)/init.d/mdadm-raid Wed, 29 Dec 2004 21:13:58 +1100
@@ -19,6 +19,8 @@
 case "$1" in
     start)
        if [ "x$AUTOSTART" = "xtrue" ] ; then
+           # give udev time to create /dev/sd* devices
+           sleep 3s
             if [ ! -f /proc/mdstat ] && [ -x /sbin/modprobe ] ; then
                 /sbin/modprobe -k md > /dev/null 2>&1
             fi


And here is a diagnosis and analysis of each fix;

1) on my system, the sata_sil driver is automatically detected and
loaded by hotplug. It seems recent versions of discover also detect and
load it (older versions didn't). Unfortunately, this module needs to be
loaded before mdadm-raid's rc script runs at priority 25
(/etc/rcS.d/S25mdadm-raid). hotplug runs at priority 40, and discover
runs at 35, so they are both too late in the boot process. /etc/modules
is loaded by module-init-tools at priority 20. People using root-raid
will already have made sure the required modules are loaded in the
initrd, so they will not hit this.

Another possible solution would be to change mdadm-raid's priority so
that it runs after discover, but then checkfs.sh at priority 30 will
fail... I wonder if you should be checking filesystems before required
modules are loaded (by discover)... Note checkfs.sh only checks non-root
partitions, as root is already mounted at that point. Perhaps a policy
of "all devices should be loaded before non-root filesystems can be
checked/mounted" would be a good idea?

2) It seems that the combination of the md kernel module and mdadm do
not give udev the required notification to create the /dev/md* devices.
It is not that they are created too late for mdadm-raid... they are not
created at all. The discussion here suggests that there is a kernel bug
contributing, and various doc's suggest some modules do not or can not
give the required notification. It seems the md module is one of these.
Fortunately udev has a workaround for devices that don't have support
for correctly notifying udev, /etc/udev/links.conf.

3) It seems udev can take a disturbingly long time to create the /dev/*
device files after a module is loaded. This took me a long time to
diagnose... on bootup mdadm kept complaining that it couldn't find the
md0 devices, but when I ran it manually it worked fine. On my system,
sata_sil is loaded using /etc/modules (see fix 1), and when the
mdadm-raid rc script ran, there were no /dev/sd* devices (there
were /dev/md* devices, see fix 2). A delay of at least 3 seconds was
required before these devices appeared and mdadm-raid would work. I
suspect on systems with more partitions this could take longer.

Note that this delay could affect any rc-script that required /dev/*
devices to exist for devices loaded by /etc/modules. I added the delay
to /etc/init.d/mdadm-raid because that's where it was hurting me.
Ideally modprobe should delay until udev has finished creating the
device files. If this is not possible, probably a better alternative
would be to add the delay to the end of /etc/init.d/module-init-tools.

So who's bug is it, and who should fix/workaround it?

part 1) is probably a user problem... the user will just need to put the
required modules into /etc/modules. Arguably the rc-script priorities of
discover, hotplug, and checkfs.sh (initscripts package)  could be
tweaked to fix this.

part 2) is possibly a kernel bug, or possibly a mdadm bug, but the
cleanest workaround would be to add the required changes
to /etc/udev/links.conf, either as a default or at least a README in the
udev package.

part 3) is definitely a udev issue. Ideally it would be fixed so module
loading with modprobe would wait until the /dev/* files are created. I
find 3 seconds disturbingly long... I realise udev would have to scan
partitions, but 3 seconds! If it was made faster this problem would
probably go away. Failing a proper fix, the cleanest workaround is to
add a delay after the /etc/init.d/module-init-tools rc script runs. As
it's a udev specific problem, have the udev package add
an /etc/init.d/udev-wait rc script that runs at priority 21 that delays
to give udev time to create the devices.

Given that 2 out of 3 of the fixes/workarounds suggest udev changes, I'd
be inclined to fling this bug back at udev... perhaps also file a
wishlist bug against initscripts to make the device autodetection and
checkfs.sh rc priorities a bit better.

-- 
Donovan Baarda <abo@minkirri.apana.org.au>
http://minkirri.apana.org.au/~abo/




Reply to: