[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Tracing silent crashes



I have a remote machine running Debian testing and kernel 2.4.21, that operates in headless mode (no keyboard or monitor attached). At random times, it seems to die, at least as far as any network connectivity is concerned (the NICs are SMC 9342 using the epic100 driver). It simply stops responding to any network request. I have a clue (difficult to verify because of the remote location) that the machine doesn't actually crash, and that the local console remains accessible; in other words, it may just be a freeze of the networking stack.

There doesn't seem to be any correlation to time of day, and sometimes I'll go weeks without this happening, when other times it may be a daily occurrence. The machine is on a UPS, so it's probably not power glitch related. I've swapped NIC units, though not varieties. And, it's been happening for a while, though I run apt-get dist-upgrade fairly regularly, and across kernel versions, so I don't think it's due to any new software change.

Upon reboot things return to normal and there's no trace of anything in the logs to indicate what the problem.

I guess I have two questions -- does anyone recognize this problem, and is there any way to capture more data that might give me a clue about what's happening. The normal log files don't yield a clue.

Thanks,

John



Reply to: