[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#670398: linux-image-2.6.32-5-amd64: SSH logins hang while hpet interrupts multiply on Intel Nehalem CPUs



On Thu, 2012-04-26 at 16:17 +0200, Sven Hoexter wrote:
> On Thu, Apr 26, 2012 at 01:45:30PM +0100, Ben Hutchings wrote:
> 
> Hi,
> 
> > You can use 'echo w > /proc/sysrq-trigger' to get a traceback for
> all
> > the tasks in D state, which might provide some clues.
> 
> ok, see the attached file.


> Apr 26 16:08:34 vdf1 kernel: [6726714.281854] SysRq : Show Blocked State
> Apr 26 16:08:34 vdf1 kernel: [6726714.281883]   task                        PC stack   pid father
> Apr 26 16:08:34 vdf1 kernel: [6726714.281912] events/0      D 0000000000000000     0    15      2 0x00000000
> Apr 26 16:08:34 vdf1 kernel: [6726714.281946]  ffffffff814891f0 0000000000000046 0000000000000000 ffff88043e4e5c0c
> Apr 26 16:08:34 vdf1 kernel: [6726714.281997]  0000000000000000 ffff88000fa15780 000000000000f9e0 ffff88043e4e5fd8
> Apr 26 16:08:34 vdf1 kernel: [6726714.282048]  0000000000015780 0000000000015780 ffff88043e4e8000 ffff88043e4e82f8
> Apr 26 16:08:34 vdf1 kernel: [6726714.282099] Call Trace:
> Apr 26 16:08:34 vdf1 kernel: [6726714.282127]  [<ffffffff8105af0e>] ? __mod_timer+0x141/0x153
> Apr 26 16:08:34 vdf1 kernel: [6726714.287558]  [<ffffffff8105a9f4>] ? try_to_del_timer_sync+0x63/0x6c
> Apr 26 16:08:34 vdf1 kernel: [6726714.287589]  [<ffffffff812fbd24>] ? schedule_timeout+0xa5/0xdd
> Apr 26 16:08:34 vdf1 kernel: [6726714.287617]  [<ffffffff8105aa88>] ? process_timeout+0x0/0x5
> Apr 26 16:08:34 vdf1 kernel: [6726714.287651]  [<ffffffffa00fc8f7>] ? ehci_endpoint_disable+0xa4/0x141 [ehci_hcd]
> Apr 26 16:08:34 vdf1 kernel: [6726714.287699]  [<ffffffffa00b8f0d>] ? usb_ep0_reinit+0x13/0x34 [usbcore]
> Apr 26 16:08:34 vdf1 kernel: [6726714.287731]  [<ffffffffa00b970a>] ? usb_reset_and_verify_device+0x87/0x3d6 [usbcore]
> Apr 26 16:08:34 vdf1 kernel: [6726714.287779]  [<ffffffffa00be1c7>] ? usb_kill_urb+0x10/0xbb [usbcore]
> Apr 26 16:08:34 vdf1 kernel: [6726714.287811]  [<ffffffffa00be1c7>] ? usb_kill_urb+0x10/0xbb [usbcore]
> Apr 26 16:08:34 vdf1 kernel: [6726714.287842]  [<ffffffffa00b9aed>] ? usb_reset_device+0x94/0x124 [usbcore]
> Apr 26 16:08:34 vdf1 kernel: [6726714.287873]  [<ffffffffa00d96a0>] ? hid_reset+0x91/0x122 [usbhid]
> Apr 26 16:08:34 vdf1 kernel: [6726714.287903]  [<ffffffff810619f7>] ? worker_thread+0x188/0x21d
> Apr 26 16:08:34 vdf1 kernel: [6726714.287932]  [<ffffffffa00d960f>] ? hid_reset+0x0/0x122 [usbhid]

This is a global workqueue task, and it's running a work item that does
error handling for a USB HID (keyboard or mouse, or possibly a UPS).
Presumably this is taking a long time for some reason.

[...]
> Apr 26 16:08:34 vdf1 kernel: [6726714.288131] sshd          D 0000000000000000     0 17858  17854 0x00000000
> Apr 26 16:08:34 vdf1 kernel: [6726714.288164]  ffff88043e473880 0000000000000086 0000000000000000 0000000000000ffc
> Apr 26 16:08:34 vdf1 kernel: [6726714.288215]  0000000000000000 0000000000000000 000000000000f9e0 ffff880270a55fd8
> Apr 26 16:08:34 vdf1 kernel: [6726714.288266]  0000000000015780 0000000000015780 ffff88043c100710 ffff88043c100a08
> Apr 26 16:08:34 vdf1 kernel: [6726714.288316] Call Trace:
> Apr 26 16:08:34 vdf1 kernel: [6726714.288340]  [<ffffffff8100f6c4>] ? __switch_to+0x1ad/0x297
> Apr 26 16:08:34 vdf1 kernel: [6726714.288367]  [<ffffffff812fbcad>] ? schedule_timeout+0x2e/0xdd
> Apr 26 16:08:34 vdf1 kernel: [6726714.288397]  [<ffffffff810482ed>] ? finish_task_switch+0x3a/0xaf
> Apr 26 16:08:34 vdf1 kernel: [6726714.288426]  [<ffffffff812fb8f0>] ? thread_return+0x79/0xe0
> Apr 26 16:08:34 vdf1 kernel: [6726714.288453]  [<ffffffff812fbb64>] ? wait_for_common+0xde/0x15b
> Apr 26 16:08:34 vdf1 kernel: [6726714.288482]  [<ffffffff8104a4cc>] ? default_wake_function+0x0/0x9
> Apr 26 16:08:34 vdf1 kernel: [6726714.288510]  [<ffffffff81062326>] ? flush_work+0x75/0x87
> Apr 26 16:08:34 vdf1 kernel: [6726714.288538]  [<ffffffff81061d00>] ? wq_barrier_func+0x0/0x9
> Apr 26 16:08:34 vdf1 kernel: [6726714.288567]  [<ffffffff811fbd44>] ? n_tty_poll+0x5e/0x138
[...]

This is the sshd task, and it's waiting for a terminal-related work item
to finish.  Presumably this is queued behind the work item for the USB
HID cleanup.

So it looks like in this case at least you're seeing a bug in USB error
recovery and not anything to do with timing using the TSC vs HPET.

Ben.

-- 
Ben Hutchings
I haven't lost my mind; it's backed up on tape somewhere.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: