[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Tracing silent crashes

On Sun Jan 18, 2004 at 08:3302AM -0500, John Ackermann N8UR wrote:
> I have a remote machine running Debian testing and kernel 2.4.21, that 
> operates in headless mode (no keyboard or monitor attached).  At random 
> times, it seems to die, at least as far as any network connectivity is 
> concerned (the NICs are SMC 9342 using the epic100 driver).  It simply 
> stops responding to any network request.  I have a clue (difficult to 
> verify because of the remote location) that the machine doesn't actually 
> crash, and that the local console remains accessible; in other words, it 
> may just be a freeze of the networking stack.
> There doesn't seem to be any correlation to time of day, and sometimes I'll 
> go weeks without this happening, when other times it may be a daily 
> occurrence.  The machine is on a UPS, so it's probably not power glitch 
> related.  I've swapped NIC units, though not varieties.  And, it's been 
> happening for a while, though I run apt-get dist-upgrade fairly regularly, 
> and across kernel versions, so I don't think it's due to any new software 
> change.
> Upon reboot things return to normal and there's no trace of anything in the 
> logs to indicate what the problem.
> I guess I have two questions -- does anyone recognize this problem, and is 
> there any way to capture more data that might give me a clue about what's 
> happening.  The normal log files don't yield a clue.

Any chance to attach a serial console to the machine? Some serial
concentrator in the rack where you could get plugged in at least for
fixing that bug? Another box of yours in the same rack? So you could
setup this box to support serial console and get all the console output
(includung kernel oopses and panics) + magic sysrequest via the serial

Michael Bergbauer <michael@noname.franken.de>
use your idle CPU cycles - See http://www.distributed.net for details.
Visit our mud Geas at geas.franken.de Port 3333

Reply to: