[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#542614: linux-image-2.6.26-2-xen-amd64: processes hung on high load



Package: linux-image-2.6.26-2-xen-amd64
Severity: normal

Hi, I've just installed Debian with xen on our new QuadCore Xeon server. 
As stress tests I have used boinc client computing rosetta@home and 
mysql server in high test load by inserting random data. I've noticed 
some freezes of sshd instances when I was conected to that server. I've 
found error message in dmesg:

[57289.496282] INFO: task sshd:22433 blocked for more than 120 seconds.
[57289.496302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[57289.496312] sshd          D 00000000ffffffff     0 22433  21026
[57289.496333]  ffff8800861d5cd8 0000000000000286 ffffffff8020d0eb 
000000010000e033
[57289.496364]  ffff8800fe0bc080 ffff8800ffd0b4c0 ffff8800fe0bc300 
00000000fe0bc428
[57289.496395]  ffffffff00000005 0000000000007c08 0000000000000000 
000000000000000f
[57289.496403] Call Trace:
[57289.496414]  [<ffffffff8020d0eb>] math_state_restore+0x6d/0x81
[57289.496420]  [<ffffffff803be844>] sock_def_readable+0x32/0x5d
[57289.496425]  [<ffffffff80421363>] unix_stream_sendmsg+0x24c/0x31d
[57289.496429]  [<ffffffff80434306>] schedule_timeout+0x1e/0xad
[57289.496434]  [<ffffffff804336bb>] wait_for_common+0x102/0x1a3
[57289.496439]  [<ffffffff80224d49>] default_wake_function+0x0/0xe
[57289.496444]  [<ffffffff8023c587>] flush_cpu_workqueue+0x9a/0xa3
[57289.496448]  [<ffffffff8023c5b3>] wq_barrier_func+0x0/0x9
[57289.496452]  [<ffffffff8023c5f6>] flush_workqueue+0x3a/0x50
[57289.496456]  [<ffffffff8035c93b>] release_dev+0x481/0x5ce
[57289.496460]  [<ffffffff80223904>] __wake_up+0x38/0x4f
[57289.496464]  [<ffffffff8035ca99>] tty_release+0x11/0x1a
[57289.496469]  [<ffffffff8028b162>] __fput+0xa1/0x16b
[57289.496473]  [<ffffffff80288883>] filp_close+0x5d/0x65
[57289.496476]  [<ffffffff80289bed>] sys_close+0xa5/0x101
[57289.496481]  [<ffffffff8020b528>] system_call+0x68/0x6d
[57289.496484]  [<ffffffff8020b4c0>] system_call+0x0/0x6d
[57289.496487] 

After i re-logged in I've seen sshd in D state in htop, after a few 
minutes it disappeared. I think that this is scheduler bug and when I 
googled for this message I've found this issue in ubuntu too.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/276476

They say, that they have a patch to solve this, I've not tested it yet.

At first I thinked that it could be caused by hardware failure, but 
dmesg in dom0 doesnt say anything unordinal.

Thanks a lot for your time.

Ondrej Kunc
CZOL Media s.r.o.
Czech Republic


-- System Information:
Debian Release: 5.0.2
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-2-xen-amd64 (SMP w/4 CPU cores)
Locale: LANG=cs_CZ.UTF-8, LC_CTYPE=cs_CZ.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash



Reply to: