[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#777683: Disabling TSO may avoid the problem



I seem to be seeing the same problem on:

 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux

with:

00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
	Subsystem: Intel Corporation Device 2002
	Flags: bus master, fast devsel, latency 0, IRQ 44
	Memory at fe600000 (32-bit, non-prefetchable) [size=128K]
	Memory at fe628000 (32-bit, non-prefetchable) [size=4K]
	I/O ports at f080 [size=32]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [e0] PCI Advanced Features
	Kernel driver in use: e1000e

The hang messages started after I rebooted into Jessie's kernel. Previously
the machine had been perfectly happy for years with Wheezy's kernel. The
machine has a second Realtek NIC that continues to work normally.

After a few days of messages like this they increased in frequency and the
network interface just stopped working altogether. After a reboot the
network interface worked again but the messages came back:

[  291.030117] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <88>
  TDT                  <8d>
  next_to_use          <8d>
  next_to_clean        <86>
buffer_info[next_to_clean]:
  time_stamp           <fffff592>
  next_to_watch        <88>
  jiffies              <fffff709>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[  293.030124] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <88>
  TDT                  <8d>
  next_to_use          <8d>
  next_to_clean        <86>
buffer_info[next_to_clean]:
  time_stamp           <fffff592>
  next_to_watch        <88>
  jiffies              <fffff8fd>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[  295.030062] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <88>
  TDT                  <8d>
  next_to_use          <8d>
  next_to_clean        <86>
buffer_info[next_to_clean]:
  time_stamp           <fffff592>
  next_to_watch        <88>
  jiffies              <fffffaf1>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[  295.041303] ------------[ cut here ]------------
[  295.041315] WARNING: CPU: 0 PID: 0 at /build/linux-QZaPpC/linux-3.16.7-ckt11/net/sched/sch_generic.c:264 dev_watchdog+0x236/0x240()
[  295.041317] NETDEV WATCHDOG: eth-office (e1000e): transmit queue 0 timed out
[  295.041319] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc hid_generic usbhid hid x86_pkg_temp_thermal intel_powerclamp intel_rapl kvm_intel kvm iTCO_wdt iTCO_vendor_support ppdev evdev crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ftdi_sio snd_hda_codec_hdmi usbserial lrw gf128mul glue_helper snd_hda_codec_realtek ablk_helper snd_hda_codec_generic psmouse i915 cryptd video snd_hda_intel drm_kms_helper drm pcspkr serio_raw parport_pc snd_hda_controller parport shpchp snd_hda_codec i2c_algo_bit snd_hwdep nuvoton_cir rc_core lpc_ich snd_pcm snd_timer mfd_core mei_me mei i2c_i801 i2c_core snd soundcore processor thermal_sys button w83627ehf hwmon_vid coretemp loop autofs4 ext4 crc16 mbcache jbd2 dm_mod raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul
[  295.041380]  crct10dif_common crc32c_intel ahci libahci libata scsi_mod xhci_hcd ehci_pci ehci_hcd r8169 mii e1000e usbcore ptp usb_common pps_core
[  295.041393] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1
[  295.041395] Hardware name:                  /DH67BL, BIOS BLH6710H.86A.0156.2012.0615.1908 06/15/2012
[  295.041396]  0000000000000009 ffffffff8150b405 ffff88031f203e28 ffffffff81067797
[  295.041399]  0000000000000000 ffff88031f203e78 0000000000000001 0000000000000000
[  295.041402]  ffff88030ec78000 ffffffff810677fc ffffffff81777fb8 ffffffff00000030
[  295.041404] Call Trace:
[  295.041406]  <IRQ>  [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[  295.041417]  [<ffffffff81067797>] ? warn_slowpath_common+0x77/0x90
[  295.041420]  [<ffffffff810677fc>] ? warn_slowpath_fmt+0x4c/0x50
[  295.041425]  [<ffffffff81074777>] ? mod_timer+0x127/0x1e0
[  295.041430]  [<ffffffff8143eb96>] ? dev_watchdog+0x236/0x240
[  295.041433]  [<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70
[  295.041436]  [<ffffffff81072ae1>] ? call_timer_fn+0x31/0x100
[  295.041439]  [<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70
[  295.041442]  [<ffffffff81074119>] ? run_timer_softirq+0x209/0x2f0
[  295.041445]  [<ffffffff8106c641>] ? __do_softirq+0xf1/0x290
[  295.041448]  [<ffffffff8106ca15>] ? irq_exit+0x95/0xa0
[  295.041451]  [<ffffffff81514455>] ? smp_apic_timer_interrupt+0x45/0x60
[  295.041455]  [<ffffffff8151253d>] ? apic_timer_interrupt+0x6d/0x80
[  295.041456]  <EOI>  [<ffffffff81074a26>] ? get_next_timer_interrupt+0x1d6/0x250
[  295.041465]  [<ffffffff813ddf9f>] ? cpuidle_enter_state+0x4f/0xc0
[  295.041468]  [<ffffffff813ddf98>] ? cpuidle_enter_state+0x48/0xc0
[  295.041472]  [<ffffffff810a7fa8>] ? cpu_startup_entry+0x2f8/0x400
[  295.041475]  [<ffffffff81903071>] ? start_kernel+0x492/0x49d
[  295.041478]  [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
[  295.041480]  [<ffffffff81902120>] ? early_idt_handlers+0x120/0x120
[  295.041483]  [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c
[  295.041485] ---[ end trace aaf46f7eeccba58f ]---
[  295.041502] e1000e 0000:00:19.0 eth-office: Reset adapter unexpectedly
[  298.763518] e1000e: eth-office NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[  548.999305] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <f3>
  TDT                  <2f>
  next_to_use          <2f>
  next_to_clean        <f3>
buffer_info[next_to_clean]:
  time_stamp           <10000f073>
  next_to_watch        <f3>
  jiffies              <10000f2f7>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[  550.999203] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <f3>
  TDT                  <2f>
  next_to_use          <2f>
  next_to_clean        <f3>
buffer_info[next_to_clean]:
  time_stamp           <10000f073>
  next_to_watch        <f3>
  jiffies              <10000f4eb>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[  552.999218] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <f3>
  TDT                  <2f>
  next_to_use          <2f>
  next_to_clean        <f3>
buffer_info[next_to_clean]:
  time_stamp           <10000f073>
  next_to_watch        <f3>
  jiffies              <10000f6df>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[  554.010452] e1000e 0000:00:19.0 eth-office: Reset adapter unexpectedly
[  557.732375] e1000e: eth-office NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 1695.979614] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <5c>
  TDT                  <f1>
  next_to_use          <f1>
  next_to_clean        <5c>
buffer_info[next_to_clean]:
  time_stamp           <1000550c4>
  next_to_watch        <5c>
  jiffies              <100055318>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[ 1697.979546] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <5c>
  TDT                  <f1>
  next_to_use          <f1>
  next_to_clean        <5c>
buffer_info[next_to_clean]:
  time_stamp           <1000550c4>
  next_to_watch        <5c>
  jiffies              <10005550c>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[ 1699.979599] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <5c>
  TDT                  <f1>
  next_to_use          <f1>
  next_to_clean        <5c>
buffer_info[next_to_clean]:
  time_stamp           <1000550c4>
  next_to_watch        <5c>
  jiffies              <100055700>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[ 1701.979440] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <5c>
  TDT                  <f1>
  next_to_use          <f1>
  next_to_clean        <5c>
buffer_info[next_to_clean]:
  time_stamp           <1000550c4>
  next_to_watch        <5c>
  jiffies              <1000558f4>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[ 1702.986675] e1000e 0000:00:19.0 eth-office: Reset adapter unexpectedly
[ 1706.728573] e1000e: eth-office NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 1810.976512] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <b2>
  TDT                  <b9>
  next_to_use          <b9>
  next_to_clean        <b0>
buffer_info[next_to_clean]:
  time_stamp           <10005c0be>
  next_to_watch        <b2>
  jiffies              <10005c366>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[ 1812.976588] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <b2>
  TDT                  <b9>
  next_to_use          <b9>
  next_to_clean        <b0>
buffer_info[next_to_clean]:
  time_stamp           <10005c0be>
  next_to_watch        <b2>
  jiffies              <10005c55a>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[ 1814.976378] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <b2>
  TDT                  <b9>
  next_to_use          <b9>
  next_to_clean        <b0>
buffer_info[next_to_clean]:
  time_stamp           <10005c0be>
  next_to_watch        <b2>
  jiffies              <10005c74e>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[ 1816.976471] e1000e 0000:00:19.0 eth-office: Detected Hardware Unit Hang:
  TDH                  <b2>
  TDT                  <b9>
  next_to_use          <b9>
  next_to_clean        <b0>
buffer_info[next_to_clean]:
  time_stamp           <10005c0be>
  next_to_watch        <b2>
  jiffies              <10005c942>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
[ 1816.986750] e1000e 0000:00:19.0 eth-office: Reset adapter unexpectedly
[ 1820.769572] e1000e: eth-office NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

At that point as a stab in the dark I ran:

 ethtool -K eth-office tso off

and the network has been reliable and no such messages have appeared since
(about 24 hours.)

Mike.


Reply to: