[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#419950: NETDEV WATCHDOG problems return



Back in April, 2007, I opened this bug because of problems with
"NETDEV WATCHDOG: eth0: transmit timed out" errors.

I found that changing one of my BIOS settings seemed to make the
problem go away.  The setup menu named "Resource Configuration"
offers a setting named "Shared PCI IRQs".  I found that if I left
this set to the original "Auto", then I would have the problems
described in the bug report, making the network unusable; but if
I changed this setting to "Share Three IRQs", then everything seemed
to work OK.

I ran like this for several months without seeing the problem.
Sometime around October/November, in the midst of several kernel revisions,
the problem returned briefly, but before I had time to investigate it,
yet another kernel upgrade or two seemed to get me back to normal.
(All these are the standard etch kernels 2.6.18-686, as pushed to me
by security updates).

This morning I finally got around to installing a big batch of recent
security updates, including a new kernel, and I'm sorry to report that
I'm seeing NETDEV WATCHDOG network paralysis again.

Previous to rebooting this morning, the machine was up for 19 days,
without problems -- previous boot was December 18th,
Linux version 2.6.18-5-686 (Debian 2.6.18.dfsg.1-13etch5).
Today's version ran OK for approximately 13 hours, then went into
continuous network lockup.  Today's version is:
Linux version 2.6.18-5-686 (Debian 2.6.18.dfsg.1-17).

Here is a little kern.log extract showing the
end of the reboot, and the start of the lockup:

Jan  6 01:16:26 legba kernel: IPv6 over IPv4 tunneling driver
Jan  6 01:16:26 legba kernel: 0000:00:10.0: tulip_stop_rxtx() failed
Jan  6 01:16:26 legba kernel: eth0: Setting full-duplex based on MII#1 link partner capability of 41e1.
Jan  6 01:16:32 legba kernel: eth0: no IPv6 routers present
Jan  6 01:16:33 legba kernel: lp0: using parport0 (interrupt-driven).
Jan  6 01:16:33 legba kernel: ppdev: user-space parallel port driver
Jan  6 04:09:07 legba kernel: 0000:00:10.0: tulip_stop_rxtx() failed
Jan  6 04:11:06 legba kernel: 0000:00:10.0: tulip_stop_rxtx() failed
Jan  6 04:18:21 legba kernel: 0000:00:10.0: tulip_stop_rxtx() failed
Jan  6 14:37:02 legba kernel: 0000:00:10.0: tulip_stop_rxtx() failed
Jan  6 14:37:11 legba kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan  6 14:37:11 legba kernel: 0000:00:10.0: tulip_stop_rxtx() failed
Jan  6 14:37:19 legba kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan  6 14:37:19 legba kernel: 0000:00:10.0: tulip_stop_rxtx() failed
Jan  6 14:37:27 legba kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan  6 14:37:27 legba kernel: 0000:00:10.0: tulip_stop_rxtx() failed

These 2 lines continue to repeat every 8 or 12 seconds, until I rebooted.

Any suggestions for what I should experiment with, are welcome.
Any other information you might want, will be happily provided.
On this new reboot, I added the kernel parameter "pci=routeirq"
as my own experiment with this, but the box has only been up for
about an hour, so I can't say for sure if it helps.



Reply to: