[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Surprising boot problem with modprobe and a stray named pipe



I had an interesting problem today.  A friend called me up to say that
after an update, his Etch box wouldn't boot amymore, and could I come
by and take a look at it.

It was hanging waiting for udev to settle, and udev was starting tons
of modprobe processes that were just hanging.

Indeed, booting with "init=/bin/sh" I couldn't modprobe anything.  Of
course my first guess was that the kernel was corrupt, modules were
corrupt, there was a version mismatch, etc.

Eventually I tried strace'ing modprobe.  It was hanging trying to read
from a named pipe, /etc/modprobe.d/supervise/control.  Looks like some
kind of coordination system, used to let modprobe wait for something
to finish.  But my other Etch machines didn't have it, and eventually
it dawned on me: that directory is supposed to be somewhere else.  It
looks like something (possibly minor filesystem corruption) caused a
daemontools supervise directory to end up in /etc/modprobe.d, and when
modprobe tried to read its configuration by scanning everything in
that directory, it got stuck trying to read from the named pipe.

It was a pretty bizarre failure, and the symptoms weren't obvious at
all.  Probably if I hadn't noticed that, we would have ended up doing
an OS reinstall.

Just thought I'd share a sysadmin war story,

----Scott.


Reply to: