Strange console-related hang at boot-time
Hi all --
Please let me know if this is the wrong forum for this query --
as you'll see by reading it, I don't yet know if it should be a
bug report, but on the other hand, it's not really an installer
I have a weird problem with a Debian "etch" file server. I've googled
around and searched list archives, but I'm not finding anything helpful.
The system is a dual Opteron 242 system with 4G of RAM, two 250G
hard-drives in some RAID1 arrays (there are two partitions on each
drive) with the "boot" and root filesystems on it, and a 3ware 9000
controller with eight more drives on it in RAID5 for a 2.6 TB-capacity
array. The array holds files which are served by NFS by this server,
mainly user accounts on our system.
A few days ago, one of the RAID1 disks failed, and apparently took
down the system, which then attempted to reboot, but the reboot hung part
way through the process -- it seemed to be in /etc/rc2.d/S20 somewhere,
the last message was from knfsd, which looked like it might be hung.
I replaced the failed disk and sync'd up the RAID1, and it's still
got the same problem. It's mounting the root filesystem OK, but
it's not finishing the start-up scripts.
The weird part is, under some circumstances (see below), I can boot
it to single-user, and if I then run all the services in /etc/rc2.d/
manually, they all start up just fine. That's the condition it's in
now, as a work-around to this problem, but I'm not happy with it,
My first guess was some kind of file system damage from the crash,
corrupting one or more of the start-up files, but since manual start-up
works fine, this seems unlikely.
However, there's a strange connection with the console devices. The
system normally talks to a console server via ttyS0, but it won't even
boot to single user unless I leave the "console=ttyS0,115200" argument
out of the boot string.
That's not the whole story either, though, because even if I leave
the "console=ttyS0,115200" argument out of the boot string, it also
won't boot to run level 2. It hangs in roughly the same place, with
some but not all of the /etc/rc2.d/S20* scripts giving messages on
the console. Of course, run-level 2 starts up a bunch of virtual
So, at the end of the process, it seems as though there's some kind
of problem with starting consoles, virtual or ttyS0, that's making
the start-up process hang. This may include corrupted files somehow,
of course, but I'm not sure where to look for problems.
Are there start-up-sequence experts who can help me with this?
Andrew Reid / email@example.com