[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Strange boot-time hang (long, sorry)



Andrew Reid wrote:

  Greetings all --

I am having difficulty diagnosing a strange boot-time hang on one of my systems. I have googled around for clues, but I'm coming up empty.

  The system is a file server, it's a dual-core Opteron machine,
4G RAM, root FS on a software RAID1 array and the served files
on a 3ware RAID5 system with a capacity of about 2.6 TB.  The
important services are ssh, nfsd, and nfs, but it's also a CUPS server.

  Several days ago, this machine crashed, probably due to a disk
failure in the RAID1 root-fs array, and subsequent to the disk replacement, was not able to reboot fully into run-level 2, instead
always hanging somewhere in the /etc/rc2.d/S20* part of the start-up
sequence.

The hang-up seemed, from available error messages, to be related to the serial console on ttyS0, so I removed that from the boot line
and from /etc/inittab, but even so, found I was only able to get the
machine into single-user mode. It still will not boot fully into multi-user mode, so it's not clear if the console issue is an important
factor.

If it's the problem I suspect, then this has come up a number of times on the list. You are probably referring to an error message similar to "unable to open initial console." On my systems this results from missing a device named /dev/console in the /dev directory, prior to udev startup. (Some posters claim to require /dev/xconsole and/or /dev/ttySx, but for me, /dev/console seems to be sufficient. Note also that I don't use initrd, so that may be a factor.)

My own, possibly faulty, understanding of the problem is that udev provides all the devices required for a kernel _except_ the initial console device, which must be physically present in the mount point directory, which later gets replaced by the udev's /dev pseudo-filesystem.

Consequently, when using udev, if you mirror a root filesystem using the -x (single filesystem, which causes the devices in the mount point directory to be missed) or if you get some filesystem corruption (which was likely in your case) causing the loss of /dev/console, then your system become unbootable and panic ensues.

I think it's a basic flaw in Debian which is long overdue for a fix, but hesitate to come to that conclusion because I don't know if I clearly understand the problem yet.



Reply to: