Surprising boot problem with modprobe and a stray named pipe
I had an interesting problem today. A friend called me up to say that
after an update, his Etch box wouldn't boot amymore, and could I come
by and take a look at it.
It was hanging waiting for udev to settle, and udev was starting tons
of modprobe processes that were just hanging.
Indeed, booting with "init=/bin/sh" I couldn't modprobe anything. Of
course my first guess was that the kernel was corrupt, modules were
corrupt, there was a version mismatch, etc.
Eventually I tried strace'ing modprobe. It was hanging trying to read
from a named pipe, /etc/modprobe.d/supervise/control. Looks like some
kind of coordination system, used to let modprobe wait for something
to finish. But my other Etch machines didn't have it, and eventually
it dawned on me: that directory is supposed to be somewhere else. It
looks like something (possibly minor filesystem corruption) caused a
daemontools supervise directory to end up in /etc/modprobe.d, and when
modprobe tried to read its configuration by scanning everything in
that directory, it got stuck trying to read from the named pipe.
It was a pretty bizarre failure, and the symptoms weren't obvious at
all. Probably if I hadn't noticed that, we would have ended up doing
an OS reinstall.
Just thought I'd share a sysadmin war story,
----Scott.
Reply to: