Bug#501742: linux-image-2.6.26-1-amd64: Random hangs/slowness and forcedeth problem
Package: linux-image-2.6.26-1-amd64
Version: 2.6.26-5
Severity: important
On versions before 2.6.26 i have been getting lots of messages like
this:
eth0: too many iterations (6) in nv_nic_irq
Apart from filling up the log, the has been no noticable impact on the
system.
After upgrading to 2.6.26, the system started to misbehave. It would
work for a few hours, and then it would slow down to the degree where a
simple command could take several minutes to complete. Finally, it
would become totally unresponsive leaving the reset button as the only
option.
Browsing through the bug reports, it looked like the hpet problem, so I
tried booting with hpet=disable. With this kernel option the system
worked for an hour and then the network stopped working with this
message in the log:
eth0: too many iterations (6) in nv_nic_irq.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: Got tx_timeout. irq: 00000032
eth0: Ring at 7d084000
eth0: Dumping tx registers
<register dump>
eth0: Dumping tx ring
<more dumps>
eth0: tx_timeout: dead entries
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:222 dev_watchdog+0xa6/0xfb()
Modules linked in: xt_limit xt_state ipt_REJECT xt_tcpudp
ipt_MASQUERADE iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack iptable_filter ip_tables x_tables video output ac battery
nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc ipv6 it87 hwmon_vid
loop parport_pc parport snd_hda_intel pcspkr k8temp usblp snd_pcm
snd_timer snd soundcore snd_page_alloc i2c_nforce2 i2c_core button
evdev ext3 jbd mbcache raid1 md_mod ide_cd_mod cdrom sd_mod
ide_pci_generic jmicron usb_storage amd74xx ide_core floppy ahci
ohci1394 ieee1394 forcedeth ata_generic sata_nv libata scsi_mod
ehci_hcd dock ohci_hcd thermal processor fan thermal_sys
Pid: 0, comm: swapper Not tainted 2.6.26-1-amd64 #1
Call Trace:
<IRQ> [<ffffffff80234878>] warn_on _slowpath+0x51/0x7a
[<ffffffffa009bf69>] :forcedeth:reg_delay+0x40/0x8a
[<ffffffffa009cb2f>] :forcedeth:nv_drain_tx+0xb4/0x186
[<ffffffffa00a11c7>] :forcedeth:nv_tx_timeout+0x1fb/0x2a4
[<ffffffff803cbd6a>] dev_watchdog+0x0/0xfb
[<ffffffff803cbe10>] dev_watchdog+0xa6/0xfb
[<ffffffff803cbd6a>] dev_watchdog+0x0/0xfb
[<ffffffff8023c861>] run_timer_softirq+0x16a/0x1e2
[<ffffffff80248bef>] ktime_get+0xc/0x41
[<ffffffff8023922f>] __do_softirq+0x5c/0xd1
[<ffffffff8020d29c>] call_softirq+0x1c/0x28
[<ffffffff8020f37c>] do_softirq+0x3c/0x81
[<ffffffff8023918f>] irq_exit+0x3f/0x83
[<ffffffff8021a9eb>] smp_apic_timer_interrupt+0x8c/0xa4
[<ffffffff8020b0a3>] default_idle+0x0/0x49
[<ffffffff8020ccc2>] apic_timer_interrupt+0x72/0x80
<EOI> [<ffffffff8021a797>] lapic_next_event+0x0/0x13
[<ffffffff8021eb20>] native_safe_halt+0x2/0x3
[<ffffffff8021eb20>] native_safe_halt+0x2/0x3
[<ffffffff8020b0cd>] default_idle+0x2a/0x49
[<ffffffff8020ac79>] cpu_idle+0x89/0xb3
---[ end trace 314e3fb7eb127ca0 ]---
I don't know if the behavour with and without hpet=disable are symptoms
of the same problem, or if it is two different bugs.
The other network interface on this MB (Asus M2N-SLI Deluxe) also uses
forcedeth, but doesn't report any problems.
This is a production server/firewall, and I wasn't able to take any more
downtime, so when hpet=disable didn't work, I reverted to a previous
kernel (2.6.24-7). Apart from the "normal" error messages ("too many
iterations...") the system has been stable for three days now.
-- Package-specific info:
-- System Information:
Debian Release: lenny/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.24-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages linux-image-2.6.26-1-amd64 depends on:
ii debconf [debconf-2.0] 1.5.22 Debian configuration management sy
ii initramfs-tools [linux-initra 0.92j tools for generating an initramfs
ii module-init-tools 3.4-1 tools for managing Linux kernel mo
linux-image-2.6.26-1-amd64 recommends no packages.
Versions of packages linux-image-2.6.26-1-amd64 suggests:
ii grub 0.97-47 GRand Unified Bootloader (Legacy v
pn linux-doc-2.6.26 <none> (no description available)
-- debconf information:
linux-image-2.6.26-1-amd64/postinst/create-kimage-link-2.6.26-1-amd64: true
shared/kernel-image/really-run-bootloader: true
linux-image-2.6.26-1-amd64/postinst/kimage-is-a-directory:
linux-image-2.6.26-1-amd64/preinst/bootloader-initrd-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/old-initrd-link-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/preinst/initrd-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/postinst/old-system-map-link-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/depmod-error-initrd-2.6.26-1-amd64: false
linux-image-2.6.26-1-amd64/preinst/overwriting-modules-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/preinst/elilo-initrd-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/bootloader-error-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/preinst/abort-install-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/preinst/lilo-initrd-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/depmod-error-2.6.26-1-amd64: false
linux-image-2.6.26-1-amd64/prerm/removing-running-kernel-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/prerm/would-invalidate-boot-loader-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/bootloader-test-error-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/preinst/abort-overwrite-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/postinst/old-dir-initrd-link-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/preinst/lilo-has-ramdisk:
linux-image-2.6.26-1-amd64/preinst/failed-to-move-modules-2.6.26-1-amd64:
Reply to: