[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Wheezy issue with broadcom 5720 nic on new dell PowerEdge



Hello list,

I know the subject has been already discussed on the list, but without any real solution...

It seems that there is a real problem with the 12th Dell generation poweredge and Wheezy about the Broadcom NIC.

In my case I'm using Dell R420 server and an up-to-date Wheezy.

During my test in our development network everything was fine.

Yesterday I decide to rack the server in your production network, and I was unable to use the eth0:


Jul 11 15:38:13 xxxxxxx kernel: [ 15.962677] ------------[ cut here ]------------ Jul 11 15:38:13 xxxxxxx kernel: [ 15.962685] WARNING: at /build/linux-s5x2oE/linux-3.2.46/net/sched/sch_generic.c:256 dev_watchdog+0xf2/0x151()
Jul 11 15:38:13 xxxxxxx kernel: [   15.962687] Hardware name: PowerEdge R420
Jul 11 15:38:13 xxxxxxx kernel: [ 15.962689] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out Jul 11 15:38:13 xxxxxxx kernel: [ 15.962691] Modules linked in: ipmi_si ipmi_devintf ipmi_msghandler dell_rbu autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop joydev usbhid hid shpchp iTCO_wdt dcdbas sb_edac snd_pcm edac_core snd_page_alloc snd_timer snd soundcore iTCO_vendor_support evdev pcspkr coretemp crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 aes_generic cryptd acpi_power_meter wmi button processor thermal_sys ext4 crc16 jbd2 mbcache dm_mod sg sr_mod cdrom sd_mod ses crc_t10dif enclosure ahci libahci ehci_hcd tg3 libphy libata usbcore usb_common megaraid_sas scsi_mod [last unloaded: ipmi_si] Jul 11 15:38:13 xxxxxxx kernel: [ 15.962733] Pid: 3576, comm: dsm_sa_datamgrd Tainted: G W 3.2.0-4-amd64 #1 Debian 3.2.46-1
Jul 11 15:38:13 xxxxxxx kernel: [   15.962736] Call Trace:
Jul 11 15:38:13 xxxxxxx kernel: [ 15.962737] <IRQ> [<ffffffff81046b75>] ? warn_slowpath_common+0x78/0x8c Jul 11 15:38:13 xxxxxxx kernel: [ 15.962746] [<ffffffff81046c21>] ? warn_slowpath_fmt+0x45/0x4a Jul 11 15:38:13 xxxxxxx kernel: [ 15.962749] [<ffffffff812a68c9>] ? netif_tx_lock+0x40/0x75 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962754] [<ffffffff812a6a39>] ? dev_watchdog+0xf2/0x151 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962757] [<ffffffff81052334>] ? run_timer_softirq+0x19a/0x261 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962760] [<ffffffff812a6947>] ? netif_tx_unlock+0x49/0x49 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962765] [<ffffffff810660a1>] ? timekeeping_get_ns+0xd/0x2a Jul 11 15:38:13 xxxxxxx kernel: [ 15.962769] [<ffffffff8104c1ac>] ? __do_softirq+0xb9/0x177 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962773] [<ffffffff81355dac>] ? call_softirq+0x1c/0x30 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962777] [<ffffffff8100f8cd>] ? do_softirq+0x3c/0x7b Jul 11 15:38:13 xxxxxxx kernel: [ 15.962780] [<ffffffff8104c414>] ? irq_exit+0x3c/0x99 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962784] [<ffffffff810241c0>] ? smp_apic_timer_interrupt+0x74/0x82 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962787] [<ffffffff8135461e>] ? apic_timer_interrupt+0x6e/0x80 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962788] <EOI> [<ffffffff8134d7c1>] ? __schedule+0x5f9/0x610 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962795] [<ffffffff811b2b9a>] ? delay_tsc+0x33/0x5e Jul 11 15:38:13 xxxxxxx kernel: [ 15.962798] [<ffffffff811c1639>] ? pci_vpd_pci22_wait+0xb9/0xdd Jul 11 15:38:13 xxxxxxx kernel: [ 15.962802] [<ffffffff810364e8>] ? should_resched+0x5/0x23 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962804] [<ffffffff811c1873>] ? pci_vpd_pci22_read+0xa9/0x121 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962806] [<ffffffff810364e8>] ? should_resched+0x5/0x23 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962811] [<ffffffff81151371>] ? read+0x102/0x182 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962815] [<ffffffff810fa443>] ? vfs_read+0x9f/0xe6 Jul 11 15:38:13 xxxxxxx kernel: [ 15.962817] [<ffffffff810fa4cf>] ? sys_read+0x45/0x6b Jul 11 15:38:13 xxxxxxx kernel: [ 15.962820] [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b Jul 11 15:38:13 xxxxxxx kernel: [ 15.962823] ---[ end trace fde705cbad5d6bde ]--- Jul 11 15:38:13 xxxxxxx kernel: [ 15.962825] tg3 0000:02:00.0: eth0: transmit timed out, resetting Jul 11 15:38:14 xxxxxxx kernel: [ 17.219582] tg3 0000:02:00.0: eth0: 0x00000000: 0x165f14e4, 0x00100406, 0x02000000, 0x00800010 Jul 11 15:38:14 xxxxxxx kernel: [ 17.219728] tg3 0000:02:00.0: eth0: 0x00000010: 0xd90a000c, 0x00000000, 0xd90b000c, 0x00000000 Jul 11 15:38:14 xxxxxxx kernel: [ 17.219873] tg3 0000:02:00.0: eth0: 0x00000020: 0xd90c000c, 0x00000000, 0x00000000, 0x04f81028
Jul 11 15:38:14 xxxxxxx kernel: [   17.220017] tg3 0000:02:00.0: eth0:
...

Whereas the eth1 was working.

Then, I've decided to came back to your development network, (20km ...!) with the server, and I was unable to reproduce. Everything is fine.

During kernel boot I always found this (in working and not working cases):

Jul 11 11:33:05 xxxxxxx kernel: [ 0.629277] pci 0000:02:00.1: address space collision: [mem 0xdd000000-0xdd03ffff pref] conflicts with 0000:02:00.0 [mem 0xdd000000-0xdd03ffff pref]

I don't know if this can be related, and how to solve it.

Is anyone have suggestions ?

Many thanks

--
Erwan Loaec


Reply to: