[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#721316: base: NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out



Hi,

We're experiencing what appears to be the same problem as well on a
Pacemaker cluster of ours; this is causing us serious issues as the
nodes are rebooted when the problem appears.

Has any progress been made in identifying a cause for this and/or curing
the problem?

>From dmesg:

> Dec 28 23:16:32 tyne kernel: [418756.268195] WARNING: at /build/linux-rrsxby/linux-3.2.51/net/sched/sch_generic.c:256 dev_watchdog+0xf2/0x151()
> Dec 28 23:16:32 tyne kernel: [418756.382761] Hardware name: X9DRD-iF
> Dec 28 23:16:32 tyne kernel: [418756.496392] NETDEV WATCHDOG: eth1 (igb): transmit queue 1 timed out
> Dec 28 23:16:33 tyne kernel: [418756.607364] Modules linked in: hmac dlm sctp libcrc32c configfs ip6table_filter ebtable_nat ebtables act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_statistic xt_CT xt_time xt_connlimit xt_realm xt_addrtype iptable_raw xt_comment 
> xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_con
> ntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_p
> hysdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_
> Dec 28 23:16:34 tyne kernel: mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_
> core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iptable_filter ip_tables x_tables nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding sha1_ssse3 sha1_generic ipmi_poweroff ipmi_devintf ipmi_si ipmi_msghandler vhost_net macvtap macvlan tun drbd lru_cache bridge stp loop kvm_intel kvm snd_pcm s
> nd_timer coretemp snd soundcore acpi_cpufreq crc32c_intel ghash_clmulni_intel mperf aesni_intel psmouse snd_page_alloc cryptd iTCO_wdt sb_edac processor i2c_i801 serio_raw aes_x86_64 ioatdma pcspkr iTCO_vendor_support aes_generic thermal_sys i2c_core joydev edac_core evdev container button acpi_pad ext4 crc16 jbd2 m
> bcache dm_mod raid1 md_mod microcode usbhid hid sg sd_mod crc_t10dif ahci lib
> Dec 28 23:16:34 tyne kernel: ahci isci libsas libata ehci_hcd scsi_transport_sas usbcore igb scsi_mod usb_common dca [last unloaded: scsi_wait_scan]
> Dec 28 23:16:34 tyne kernel: [418758.541550] Pid: 0, comm: swapper/0 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.51-1
> Dec 28 23:16:34 tyne kernel: [418758.652098] Call Trace:
> Dec 28 23:16:35 tyne kernel: [418758.761884]  <IRQ>  [<ffffffff81046cbd>] ? warn_slowpath_common+0x78/0x8c
> Dec 28 23:16:35 tyne kernel: [418758.869948]  [<ffffffff81046d69>] ? warn_slowpath_fmt+0x45/0x4a
> Dec 28 23:16:35 tyne kernel: [418758.977593]  [<ffffffff812a6f11>] ? netif_tx_lock+0x40/0x75
> Dec 28 23:16:35 tyne kernel: [418759.082681]  [<ffffffff812a7081>] ? dev_watchdog+0xf2/0x151
> Dec 28 23:16:35 tyne kernel: [418759.186240]  [<ffffffff81052480>] ? run_timer_softirq+0x19a/0x261
> Dec 28 23:16:35 tyne kernel: [418759.287841]  [<ffffffff812a6f8f>] ? netif_tx_unlock+0x49/0x49
> Dec 28 23:16:35 tyne kernel: [418759.387569]  [<ffffffff8104c2f8>] ? __do_softirq+0xb9/0x177
> Dec 28 23:16:35 tyne kernel: [418759.486351]  [<ffffffff81096529>] ? rcu_needs_cpu+0x50/0x1bb
> Dec 28 23:16:35 tyne kernel: [418759.583008]  [<ffffffff8135646c>] ? call_softirq+0x1c/0x30
> Dec 28 23:16:35 tyne kernel: [418759.677333]  [<ffffffff8100f8cd>] ? do_softirq+0x3c/0x7b
> Dec 28 23:16:36 tyne kernel: [418759.770142]  [<ffffffff8104c560>] ? irq_exit+0x3c/0x99
> Dec 28 23:16:36 tyne kernel: [418759.860906]  [<ffffffff8100f5fd>] ? do_IRQ+0x82/0x98
> Dec 28 23:16:36 tyne kernel: [418759.954639]  [<ffffffff8134f4ee>] ? common_interrupt+0x6e/0x6e
> Dec 28 23:16:36 tyne kernel: [418760.048124]  <EOI>  [<ffffffff811ee07d>] ? intel_idle+0xea/0x119
> Dec 28 23:16:36 tyne kernel: [418760.137012]  [<ffffffff811ee05c>] ? intel_idle+0xc9/0x119
> Dec 28 23:16:36 tyne kernel: [418760.222705]  [<ffffffff8126febd>] ? cpuidle_idle_call+0xec/0x179
> Dec 28 23:16:36 tyne kernel: [418760.306317]  [<ffffffff8100d243>] ? cpu_idle+0xa5/0xf2
> Dec 28 23:16:36 tyne kernel: [418760.388391]  [<ffffffff816abb36>] ? start_kernel+0x3b8/0x3c3
> Dec 28 23:16:36 tyne kernel: [418760.470137]  [<ffffffff816ab140>] ? early_idt_handlers+0x140/0x140
> Dec 28 23:16:36 tyne kernel: [418760.548953]  [<ffffffff816ab3c4>] ? x86_64_start_kernel+0x104/0x111
> Dec 28 23:16:36 tyne kernel: [418760.626209] ---[ end trace 25448d4e9ff0e259 ]---
> Dec 28 23:16:37 tyne kernel: [418760.710249] igb 0000:06:00.1: eth1: Reset adapter
> Dec 28 23:16:37 tyne kernel: [418760.814181] igb 0000:06:00.0: eth0: Reset adapter
- and -
> Dec 28 23:16:32 tees kernel: [419013.476706] WARNING: at /build/linux-rrsxby/linux-3.2.51/net/sched/sch_generic.c:256 dev_watchdog+0xf2/0x151()
> Dec 28 23:16:33 tees kernel: [419013.591003] Hardware name: X9DRD-iF
> Dec 28 23:16:33 tees kernel: [419013.705052] NETDEV WATCHDOG: eth1 (igb): transmit queue 3 timed out
> Dec 28 23:16:34 tees kernel: [419013.817376] Modules linked in: hmac dlm sctp libcrc32c configfs ip6table_filter ebtable_nat ebtables act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_statistic xt_CT xt_time xt_connlimit xt_realm xt_addrtype iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink
 _
log xt_multiport xt_mark xt_
> Dec 28 23:16:34 tees kernel: mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iptable_filter ip_tables x_tables nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding sha1_ssse3 sha1_generic ipmi_poweroff ipmi_devintf ipmi_si ipmi_msghandler vhost_net macvtap macvlan tun drbd lru_cache bridge stp loop kvm_intel kvm snd_pcm snd_timer snd i2c_i801 coretemp crc32c_intel iTCO_wdt soundcore ghash_clmulni_intel acpi_cpufreq 
(this is as far as that server got before being STONITHed)

Both servers have Supermicro X9DRD-iF motherboards and are running
linux-image-3.2.0-4-amd64 3.2.51-1.

lspci -vvv for one of the ports in question (eth1 on tyne) is:
> 06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
> 	Subsystem: Super Micro Computer Inc Device 1521
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin B routed to IRQ 17
> 	Region 0: Memory at fbd00000 (32-bit, non-prefetchable) [size=128K]
> 	Region 2: I/O ports at d000 [size=32]
> 	Region 3: Memory at fbdc0000 (32-bit, non-prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> 		Address: 0000000000000000  Data: 0000
> 		Masking: 00000000  Pending: 00000000
> 	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
> 		Vector table: BAR=3 offset=00000000
> 		PBA: BAR=3 offset=00002000
> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> 			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
> 		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
> 			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <4us, L1 <32us
> 			ClockPM- Surprise- LLActRep- BwNot-
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
> 		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [140 v1] Device Serial Number 00-25-90-ff-ff-4e-ae-18
> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
> 		ARICap:	MFVC- ACS-, Next Function: 0
> 		ARICtl:	MFVC- ACS-, Function Group: 0
> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
> 		IOVCap:	Migration-, Interrupt Message Number: 000
> 		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
> 		IOVSta:	Migration-
> 		Initial VFs: 8, Total VFs: 8, Number of VFs: 8, Function Dependency Link: 01
> 		VF offset: 384, stride: 4, Device ID: 1520
> 		Supported Page Size: 00000553, System Page Size: 00000001
> 		Region 0: Memory at fbd60000 (32-bit, non-prefetchable)
> 		Region 3: Memory at fbd40000 (32-bit, non-prefetchable)
> 		VF Migration: offset: 00000000, BIR: 0
> 	Capabilities: [1a0 v1] Transaction Processing Hints
> 		Device specific mode supported
> 		Steering table in TPH capability structure
> 	Capabilities: [1d0 v1] Access Control Services
> 		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 	Kernel driver in use: igb

Please let me know if I can provide any further information.

Best regards,
Chris

-- 
Chris Boot
Tiger Computing Ltd
"Linux for Business"

Tel: 01600 483 484
Web: http://www.tiger-computing.co.uk
Follow us on Facebook: http://www.facebook.com/TigerComputing

Registered in England. Company number: 3389961
Registered address: Wyastone Business Park,
 Wyastone Leys, Monmouth, NP25 3SR


Reply to: