Re: Tracing silent crashes
Thanks to all who replied. I was able to take a monitor to the machine and
discovered that there was an error in the NTP configuration (I'm using a
GPS-disciplined oscillator for the timecode, and was using the kernel PPS
interface patches) that was causing some sort of meltdown. I've posted a
message with the gory details to the NTP mailing list, so I'll spare you
But thanks in particular for the hints on network syslog and using a
console terminal. I'm going to implement some combination of those to make
future problems easier to solve.
--On Sunday, January 18, 2004 14:45:38 +0100 Michael Bergbauer
On Sun Jan 18, 2004 at 08:3302AM -0500, John Ackermann N8UR wrote:
I have a remote machine running Debian testing and kernel 2.4.21, that
operates in headless mode (no keyboard or monitor attached). At random
times, it seems to die, at least as far as any network connectivity is
concerned (the NICs are SMC 9342 using the epic100 driver). It simply
stops responding to any network request. I have a clue (difficult to
verify because of the remote location) that the machine doesn't actually
crash, and that the local console remains accessible; in other words, it
may just be a freeze of the networking stack.
There doesn't seem to be any correlation to time of day, and sometimes
I'll go weeks without this happening, when other times it may be a
daily occurrence. The machine is on a UPS, so it's probably not power
glitch related. I've swapped NIC units, though not varieties. And,
it's been happening for a while, though I run apt-get dist-upgrade
fairly regularly, and across kernel versions, so I don't think it's due
to any new software change.
Upon reboot things return to normal and there's no trace of anything in
the logs to indicate what the problem.
I guess I have two questions -- does anyone recognize this problem, and
is there any way to capture more data that might give me a clue about
what's happening. The normal log files don't yield a clue.
Any chance to attach a serial console to the machine? Some serial
concentrator in the rack where you could get plugged in at least for
fixing that bug? Another box of yours in the same rack? So you could
setup this box to support serial console and get all the console output
(includung kernel oopses and panics) + magic sysrequest via the serial
Michael Bergbauer <email@example.com>
use your idle CPU cycles - See http://www.distributed.net for details.
Visit our mud Geas at geas.franken.de Port 3333
To UNSUBSCRIBE, email to firstname.lastname@example.org
with a subject of "unsubscribe". Trouble? Contact