Stephen Gran wrote:
This one time, at band camp, Erik Persson said:About one week ago we suddenly began to get kernel: NETDEV WATCHDOG: eth1: transmit timed out error messages, and all taffic on eth1 just died shortly thereafter.We rebootet the machine and the messages disappeared for 2 days, but today the problem appeared again. I have no idea what makes this happen.Does anyone have any idea about what could be the problem?You have a flaky: A) NIC b) cabling/hub/switch C) kernel module Upgrading the kernel is free, so I suggest that as a first step. If that fails, start swapping hardware.
Thanks!I'll try upgrading the kernel. We have a spare switch that I could try as well.
The strange part is that the router has been running for a couple of months without this problem and with full load. Even though the other interfaces are on the same 4 port nic, the problem seems to affect only eth1, and the load is about the same for at least eth0.
That makes me wonder if it's not a hardware problem.Sadly it is hard to do troubleshooting when the problem is so unfrequent; the router has been running since yesterday and there are no "netdev watchdog" error messages since then. Thus whatever I do, the problem may still be there and surface in a week or two :-/ It's certainly not very pleasant to have a proven unstable system and not being able to pinpoint the error.
/Erik Persson.