Bug#542614: linux-image-2.6.26-2-xen-amd64: processes hung on high load
Package: linux-image-2.6.26-2-xen-amd64
Severity: normal
Hi, I've just installed Debian with xen on our new QuadCore Xeon server.
As stress tests I have used boinc client computing rosetta@home and
mysql server in high test load by inserting random data. I've noticed
some freezes of sshd instances when I was conected to that server. I've
found error message in dmesg:
[57289.496282] INFO: task sshd:22433 blocked for more than 120 seconds.
[57289.496302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[57289.496312] sshd D 00000000ffffffff 0 22433 21026
[57289.496333] ffff8800861d5cd8 0000000000000286 ffffffff8020d0eb
000000010000e033
[57289.496364] ffff8800fe0bc080 ffff8800ffd0b4c0 ffff8800fe0bc300
00000000fe0bc428
[57289.496395] ffffffff00000005 0000000000007c08 0000000000000000
000000000000000f
[57289.496403] Call Trace:
[57289.496414] [<ffffffff8020d0eb>] math_state_restore+0x6d/0x81
[57289.496420] [<ffffffff803be844>] sock_def_readable+0x32/0x5d
[57289.496425] [<ffffffff80421363>] unix_stream_sendmsg+0x24c/0x31d
[57289.496429] [<ffffffff80434306>] schedule_timeout+0x1e/0xad
[57289.496434] [<ffffffff804336bb>] wait_for_common+0x102/0x1a3
[57289.496439] [<ffffffff80224d49>] default_wake_function+0x0/0xe
[57289.496444] [<ffffffff8023c587>] flush_cpu_workqueue+0x9a/0xa3
[57289.496448] [<ffffffff8023c5b3>] wq_barrier_func+0x0/0x9
[57289.496452] [<ffffffff8023c5f6>] flush_workqueue+0x3a/0x50
[57289.496456] [<ffffffff8035c93b>] release_dev+0x481/0x5ce
[57289.496460] [<ffffffff80223904>] __wake_up+0x38/0x4f
[57289.496464] [<ffffffff8035ca99>] tty_release+0x11/0x1a
[57289.496469] [<ffffffff8028b162>] __fput+0xa1/0x16b
[57289.496473] [<ffffffff80288883>] filp_close+0x5d/0x65
[57289.496476] [<ffffffff80289bed>] sys_close+0xa5/0x101
[57289.496481] [<ffffffff8020b528>] system_call+0x68/0x6d
[57289.496484] [<ffffffff8020b4c0>] system_call+0x0/0x6d
[57289.496487]
After i re-logged in I've seen sshd in D state in htop, after a few
minutes it disappeared. I think that this is scheduler bug and when I
googled for this message I've found this issue in ubuntu too.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/276476
They say, that they have a patch to solve this, I've not tested it yet.
At first I thinked that it could be caused by hardware failure, but
dmesg in dom0 doesnt say anything unordinal.
Thanks a lot for your time.
Ondrej Kunc
CZOL Media s.r.o.
Czech Republic
-- System Information:
Debian Release: 5.0.2
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.26-2-xen-amd64 (SMP w/4 CPU cores)
Locale: LANG=cs_CZ.UTF-8, LC_CTYPE=cs_CZ.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Reply to: