[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Broadcom TG3 network drops, cannot recover without reboot



On Tue, May 26, 2015, at 09:24, Justin Catterall wrote:
> At irregular times, and apparently for no reason at all, networking
> drops and cannot be restarted without reboot on a fresh install of
> Jessie. The NIC is a Broadcom NetXtreme BCM5720.
> 
> ifconfig thinks networking is still up because I can:
> 	ifconfig eth0 down
> 
> I find this when I try 'ifconfig eth0 up':
> tg3_abort_hw timed out TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff

Hmm, it is either a kernel issue, or a hardware issue.

> Any suggestions on where to look for a solution?

Yes.

First, disable all hardware offloading using ethtool.  See if that
helps.

Also, if this NIC is in the system mainboard, make sure you are using
the latest firmware ("BIOS update") from your motherboard vendor: it is
usual to have the motherboard NICs use a data block in the shared system
FLASH for vital product data and firmware. The motherboard vendor will
bundle up updates for the NIC firmware with the BIOS updates when both
are in the same FLASH chip.

Make sure you have the latest linux firmware file for the tg3 driver as
well.  If the initramfs image has the tg3.ko module inside, it must also
have the firmware file.  A workaround for any initramfs-related tg3
firmware loading issues is to "rmmod tg3 ; modprobe tg3"  after the
system booted (and before the NIC hardlocks).

If all of the above failed, get yourself familiar with building a custom
Debian-compatible kernel using pristine upstream kernels from
www.kernel.org.  Wait until 3.18.15 and 4.0.5 are released in
www.kernel.org, and build custom kernels based on them.  Alternatively,
wait until a debian-packaged version of kernel 4.0.5 is available.  DO
NOT use 4.0 kernels before 4.0.5 on pain of possible data loss.

If either the 3.18.15 or 4.0.5 kernel fixes the issue with your bcm5720,
please tell us so that we can try to isolate the fix and backport it to
the Debian kernel.

If that fails, you will have to engage the kernel community itself for a
fix.  Please file a bug on bugzilla.kernel.org, and good luck. There are
several hardware hang reports open against BCM57xx + tg3.

Alternatively, try to get yourself an Intel NIC that works with the igb
driver (don't get an Intel NIC that needs the e1000e driver) to replace
the hardlock-prone bcm5720 + tg3 combination.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique de Moraes Holschuh <hmh@debian.org>


Reply to: