[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: NETDEV WATCHDOG error. Net dies. Why?



Stephen Gran wrote:
This one time, at band camp, Erik Persson said:
About one week ago we suddenly began to get

kernel: NETDEV WATCHDOG: eth1: transmit timed out

error messages, and all taffic on eth1 just died shortly thereafter.

We rebootet the machine and the messages disappeared for 2 days, but today the problem appeared again. I have no idea what makes this happen.

Does anyone have any idea about what could be the problem?

You have a flaky:
A) NIC
b) cabling/hub/switch
C) kernel module

Upgrading the kernel is free, so I suggest that as a first step.  If
that fails, start swapping hardware.

Thanks!

I'll try upgrading the kernel. We have a spare switch that I could try as well.

The strange part is that the router has been running for a couple of months without this problem and with full load. Even though the other interfaces are on the same 4 port nic, the problem seems to affect only eth1, and the load is about the same for at least eth0.
That makes me wonder if it's not a hardware problem.
Sadly it is hard to do troubleshooting when the problem is so unfrequent; the router has been running since yesterday and there are no "netdev watchdog" error messages since then. Thus whatever I do, the problem may still be there and surface in a week or two :-/ It's certainly not very pleasant to have a proven unstable system and not being able to pinpoint the error.

/Erik Persson.



Reply to: