[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Strange console-related hang at boot-time



  Hi all --

  Please let me know if this is the wrong forum for this query --
as you'll see by reading it, I don't yet know if it should be a
bug report, but on the other hand, it's not really an installer
question. 

  I have a weird problem with a Debian "etch" file server.  I've googled
around and searched list archives, but I'm not finding anything helpful.

  The system is a dual Opteron 242 system with 4G of RAM, two 250G
hard-drives in some RAID1 arrays (there are two partitions on each 
drive) with the "boot" and root filesystems on it, and a 3ware 9000 
controller with eight more drives on it in RAID5 for a 2.6 TB-capacity 
array.  The array holds files which are served by NFS by this server, 
mainly user accounts on our system.

  A few days ago, one of the RAID1 disks failed, and apparently took 
down the system, which then attempted to reboot, but the reboot hung part
way through the process -- it seemed to be in /etc/rc2.d/S20 somewhere,
the last message was from knfsd, which looked like it might be hung.

  I replaced the failed disk and sync'd up the RAID1, and it's still
got the same problem.  It's mounting the root filesystem OK, but
it's not finishing the start-up scripts.

  The weird part is, under some circumstances (see below), I can boot
it to single-user, and if I then run all the services in /etc/rc2.d/
manually, they all start up just fine.  That's the condition it's in 
now, as a work-around to this problem, but I'm not happy with it,
of course.

  My first guess was some kind of file system damage from the crash,
corrupting one or more of the start-up files, but since manual start-up
works fine, this seems unlikely.

  However, there's a strange connection with the console devices. The
system normally talks to a console server via ttyS0, but it won't even
boot to single user unless I leave the "console=ttyS0,115200" argument
out of the boot string.  

  That's not the whole story either, though, because even if I leave 
the "console=ttyS0,115200" argument out of the boot string, it also
won't boot to run level 2.  It hangs in roughly the same place, with
some but not all of the /etc/rc2.d/S20* scripts giving messages on
the console.  Of course, run-level 2 starts up a bunch of virtual
consoles.

  So, at the end of the process, it seems as though there's some kind
of problem with starting consoles, virtual or ttyS0,  that's making 
the start-up process hang.  This may include corrupted files somehow, 
of course, but I'm not sure where to look for problems.

  Are there start-up-sequence experts who can help me with this?

  Thanks.
		
				-- A. 
-- 
Andrew Reid / reidac@bellatlantic.net



Reply to: