[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#654917: linux-image-2.6.32-5-xen-amd64 - Regression: Improper interrupt handling



Source: linux-2.6
Version: 2.6.32-40
Severity: serious

I updated one of the Xen hosts to 2.6.32-40. It is not longer able to
properly communicate with USB devices. This is a regression to -38.

FTDI devices don't give any reply and report errors sometimes:

| [424038.450017] ftdi_sio ttyUSB0: ftdi_submit_read_urb - failed submitting read urb, error -1

The APC UPS connected via a usb hub blocked every now end then and
sometimes even got killed by the hub.

I decided to ask the kernel to suspend the USB hub connecting the APC
devices to give it a reset. This blocked the kernel hard:

| [432063.617877] hub 1-2:1.0: port 1 nyet suspended
| [432063.617957] hub 1-2:1.0: port 2 nyet suspended
| [432063.618021] hub 1-2:1.0: port 3 nyet suspended
| [432063.618111] hub 1-2:1.0: port 4 nyet suspended
| [432063.637219] usb 1-2: clear tt 1 (86d0) error -108
| [432063.637316] usb 1-2: clear tt 1 (86c0) error -113
| [432068.636118] /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-2.3/input0, status -110
| [432073.636156] /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-2.2/input0, status -110
| [432073.648298] /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-2.4/input0, status -71
| [432073.648500] usb 1-2: clear tt 1 (86c0) error -113
| [432073.648576] usb 1-2: clear tt 1 (86d0) error -113
| [432073.648649] usb 1-2: clear tt 1 (06f0) error -113
| [432078.649989] /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-2.3/input0, status -110
| [432078.658002] usb 1-2: clear tt 1 (86c0) error -113
| [432078.658087] usb 1-2: clear tt 1 (86d0) error -113
| [432083.656130] /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-2.2/input0, status -110
| [432088.656134] /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-2.4/input0, status -110
| [432088.656544] usb 1-2: clear tt 1 (86c0) error -113
| [432088.656627] usb 1-2: clear tt 1 (86d0) error -113
| [432088.656700] usb 1-2: clear tt 1 (86f0) error -113
| [432093.656151] /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-2.3/input0, status -110
| [432098.660153] /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/hid/usbhid/hid-core.c: can't reset device, 0000:00:12.2-2.2/input0, status -110

After that nothing was working and the network driver reported a
timeout:

| [432122.804560] ------------[ cut here ]------------
| [432122.804658] WARNING: at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_xen/net/sched/sch_generic.c:261 dev_watchdog+
| 0xe2/0x194()
| [432122.804807] Hardware name: H8QG6
| [432122.804865] NETDEV WATCHDOG: eth0 (igb): transmit queue 1 timed out
| [432122.804933] Modules linked in: sha1_generic drbd lru_cache cn xen_evtchn xenfs ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
|  ip6_tables xt_physdev ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_ta
| bles 8021q garp bridge stp ipmi_devintf ipmi_si ipmi_msghandler sg sr_mod cdrom ftdi_sio snd_pcm snd_timer snd soundcore snd_pag
| e_alloc psmouse i2c_piix4 edac_core pcspkr serio_raw i2c_core usbserial k10temp edac_mce_amd button processor acpi_processor joy
| dev evdev usbhid hid ext4 mbcache jbd2 crc16 dm_mod btrfs zlib_deflate crc32c libcrc32c usb_storage sd_mod crc_t10dif ohci_hcd ehci_hcd 3w_9xxx igb dca usbcore nls_base scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
| [432122.806185] Pid: 0, comm: swapper Tainted: G        W  2.6.32-5-xen-amd64 #1
| [432122.806261] Call Trace:
| [432122.806314]  <IRQ>  [<ffffffff81273542>] ? dev_watchdog+0xe2/0x194
| [432122.806403]  [<ffffffff81273542>] ? dev_watchdog+0xe2/0x194
| [432122.806476]  [<ffffffff8104ef8c>] ? warn_slowpath_common+0x77/0xa3
| [432122.806554]  [<ffffffff81273460>] ? dev_watchdog+0x0/0x194
| [432122.806624]  [<ffffffff8104f014>] ? warn_slowpath_fmt+0x51/0x59
| [432122.806700]  [<ffffffff8130dbda>] ? _spin_unlock_irqrestore+0xd/0xe
| [432122.806773]  [<ffffffff8104b4aa>] ? try_to_wake_up+0x289/0x29b
| [432122.806846]  [<ffffffff8107120f>] ? tick_dev_program_event+0x2d/0x95
| [432122.806922]  [<ffffffff81273434>] ? netif_tx_lock+0x3d/0x69
| [432122.806996]  [<ffffffff8125de9b>] ? netdev_drivername+0x3b/0x40
| [432122.807072]  [<ffffffff81273542>] ? dev_watchdog+0xe2/0x194
| [432122.807151]  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
| [432122.807222]  [<ffffffff8100ec12>] ? xen_vcpuop_set_next_event+0x0/0x60
| [432122.807299]  [<ffffffff8105b66f>] ? run_timer_softirq+0x1c9/0x268
| [432122.807375]  [<ffffffff81054d1b>] ? __do_softirq+0xdd/0x1a6
| [432122.807475]  [<ffffffff811f2783>] ? __xen_evtchn_do_upcall+0x245/0x28d
| [432122.807551]  [<ffffffff81012cac>] ? call_softirq+0x1c/0x30
| [432122.807621]  [<ffffffff8101422b>] ? do_softirq+0x3f/0x7c
| [432122.807691]  [<ffffffff81054b8b>] ? irq_exit+0x36/0x76
| [432122.807759]  [<ffffffff811f2f74>] ? xen_evtchn_do_upcall+0x33/0x42
| [432122.807832]  [<ffffffff81012cfe>] ? xen_do_hypervisor_callback+0x1e/0x30
| [432122.807903]  <EOI>  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
| [432122.807988]  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
| [432122.808060]  [<ffffffff8100e6b3>] ? xen_safe_halt+0xc/0x15
| [432122.808130]  [<ffffffff8100bfc7>] ? xen_idle+0x37/0x40
| [432122.808197]  [<ffffffff81010e97>] ? cpu_idle+0xa2/0xda
| [432122.808266]  [<ffffffff8100ec99>] ? xen_irq_enable_direct_end+0x0/0x7
| [432122.808339]  [<ffffffff81302458>] ? cpu_bringup+0x6d/0x72
| [432122.808406] ---[ end trace a7919e7f17c0a727 ]---
| [432122.808536] igb 0000:02:00.0: eth0: Reset adapter

Now the kernel only reports blocked tasks in different states:

| [432240.397534] INFO: task screen:3025 blocked for more than 120 seconds.
| [432240.397534] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [432240.397534] screen        D 0000000000000002     0  3025   3024 0x00000000
| [432240.397534]  ffff8800cf6e8e20 0000000000000286 ffffffff8100ecdf ffffffff8100c1a7
| [432240.397534]  ffff8800cf6e8e68 0000000000000000 000000000000f9e0 ffff8800cd891fd8
| [432240.397534]  0000000000015780 0000000000015780 ffff8800ce09f100 ffff8800ce09f3f8
| [432240.397534] Call Trace:
| [432240.397534]  [<ffffffff8100ecdf>] ? xen_restore_fl_direct_end+0x0/0x1
| [432240.397534]  [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185
| [432240.397534]  [<ffffffff810106c4>] ? __switch_to+0x1ad/0x297
| [432240.397534]  [<ffffffff8130cbfd>] ? schedule_timeout+0x2e/0xdd
| [432240.397534]  [<ffffffff810492e7>] ? finish_task_switch+0x44/0xaf
| [432240.397534]  [<ffffffff8130c840>] ? thread_return+0x79/0xe0
| [432240.397534]  [<ffffffff8130cab4>] ? wait_for_common+0xde/0x15b
| [432240.397534]  [<ffffffff8104b4bc>] ? default_wake_function+0x0/0x9
| [432240.397534]  [<ffffffff8106327a>] ? flush_work+0x75/0x87
| [432240.397534]  [<ffffffff81062c54>] ? wq_barrier_func+0x0/0x9
| [432240.397534]  [<ffffffff8120b9f8>] ? n_tty_poll+0x5e/0x138
| [432240.397534]  [<ffffffff812083e6>] ? tty_poll+0x56/0x6d
| [432240.397534]  [<ffffffff810fd736>] ? do_select+0x37b/0x57a
| [432240.397534]  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
| [432240.397534]  [<ffffffff810fddaf>] ? __pollwait+0x0/0xd6
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff810fde85>] ? pollwake+0x0/0x5b
| [432240.397534]  [<ffffffff8100e635>] ? xen_force_evtchn_callback+0x9/0xa
| [432240.397534]  [<ffffffff810fdab9>] ? core_sys_select+0x184/0x21e
| [432240.397534]  [<ffffffff81154621>] ? cap_file_permission+0x0/0x3
| [432240.397534]  [<ffffffff8100e635>] ? xen_force_evtchn_callback+0x9/0xa
| [432240.397534]  [<ffffffff81154621>] ? cap_file_permission+0x0/0x3
| [432240.397534]  [<ffffffff8100e635>] ? xen_force_evtchn_callback+0x9/0xa
| [432240.397534]  [<ffffffff8102ddc4>] ? pvclock_clocksource_read+0x3a/0x8b
| [432240.397534]  [<ffffffff81154621>] ? cap_file_permission+0x0/0x3
| [432240.397534]  [<ffffffff8106d4ef>] ? ktime_get_ts+0x68/0xb2
| [432240.397534]  [<ffffffff810fdd86>] ? sys_select+0x92/0xbb
| [432240.397534]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b

Bastian

-- 
Beam me up, Scotty, there's no intelligent life down here!



Reply to: