Bug#517449: marked as done (linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds))
Your message dated Fri, 12 Mar 2010 00:25:21 +0100
with message-id <20100311232521.GA3568@baikonur.stro.at>
and subject line Re: linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds)
has caused the Debian Bug report #517449,
regarding linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds)
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)
--
517449: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=517449
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds)
- From: Thibaut VARENE <varenet@debian.org>
- Date: Fri, 27 Feb 2009 21:48:20 +0100
- Message-id: <20090227204820.31664.40890.reportbug@lilo.esiee.fr>
Package: linux-image-2.6.26-1-amd64
Version: 2.6.26-13
Severity: important
Tags: patch
* ISSUE
Lenny's kernel is subject to the bug described here:
http://lkml.org/lkml/2009/1/11/70
* ANALYSIS & FIX
and fixed with this thread:
http://lkml.org/lkml/2009/1/15/107
(in particular with http://lkml.org/lkml/2009/1/15/231 and http://lkml.org/lkml/2009/1/15/240, AFAIU)
FWIW, this seems to have made it to 2.6.28.y at least, with:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=046e7f77d734778a3b2e7d51ce63da3dbe7a8168
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=df94b12439ca449a852e579fc2952dac80f70c90
* TYPICAL SYMPTOMS
Basically, running tasks at SCHED_IDLEPRIO (such as BOINC) renders the system sluggish and randomly unresponsive.
Messages such as this one appear in dmesg:
[1830473.188790] INFO: task pdflush:3945 blocked for more than 120 seconds.
[1830473.269257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1830473.365255] pdflush D ffff81003c595800 0 3945 2
[1830473.365274] ffff81002f055d60 0000000000000046 ffff81002f055d70 ffffffff804289af
[1830473.365278] ffff8100258eaee0 ffff81003e966f60 ffff8100258eb168 000000028031dc67
[1830473.365281] 0000000000000008 ffffffffa03445b4 ffff81000a2e8340 000000000000000b
[1830473.365283] Call Trace:
[1830473.365380] [<ffffffff804289af>] thread_return+0x6b/0xac
[1830473.365448] [<ffffffffa03445b4>] :xfs:xfs_log_move_tail+0x46/0x12c
[1830473.365472] [<ffffffffa035b055>] :xfs:xfs_buf_wait_unpin+0x86/0xa8
[1830473.365479] [<ffffffff8022c202>] default_wake_function+0x0/0xe
[1830473.365503] [<ffffffffa035b49b>] :xfs:xfs_buf_iorequest+0x20/0x61
[1830473.365538] [<ffffffffa035ec2e>] :xfs:xfs_bdstrat_cb+0x36/0x3a
[1830473.365559] [<ffffffffa0357d59>] :xfs:xfs_bwrite+0x5e/0xbb
[1830473.365580] [<ffffffffa0352209>] :xfs:xfs_syncsub+0x119/0x226
[1830473.365602] [<ffffffffa03600d4>] :xfs:xfs_fs_write_super+0x1b/0x21
[1830473.365608] [<ffffffff8029cd90>] sync_supers+0x60/0xa4
[1830473.365615] [<ffffffff802783f2>] pdflush+0x0/0x211
[1830473.365619] [<ffffffff80277fb9>] wb_kupdate+0x2d/0x10d
[1830473.369036] [<ffffffff802783f2>] pdflush+0x0/0x211
[1830473.369036] [<ffffffff80278556>] pdflush+0x164/0x211
[1830473.369036] [<ffffffff80277f8c>] wb_kupdate+0x0/0x10d
[1830473.369036] [<ffffffff80246083>] kthread+0x47/0x74
[1830473.369036] [<ffffffff80230196>] schedule_tail+0x27/0x5c
[1830473.369036] [<ffffffff8020cf28>] child_rip+0xa/0x12
[1830473.369036] [<ffffffff80213299>] restore_i387_ia32+0xb0/0xd4
[1830473.369036] [<ffffffff8024603c>] kthread+0x0/0x74
[1830473.369036] [<ffffffff8020cf1e>] child_rip+0x0/0x12
Sometimes keyboard input will yield repeated keystrokes. SSH session will stop echoing. And basically hell freezes over
for 2 minutes.
I believe this bug relates to #498328, #499046 and possibly #499198
This is an extremely nasty bug. I've seen it very frequently while running BOINC on Xen dom0 on a 16-core box (using
debian xen kernel). A temporary workaround has been to cap BOINC to 90% CPU usage: freezes still happen but last less.
HTH
-- Package-specific info:
** Version:
Linux version 2.6.26-1-amd64 (Debian 2.6.26-13) (waldi@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-24)) #1 SMP Sat Jan 10 17:57:00 UTC 2009
-- System Information:
Debian Release: 5.0
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.26-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages linux-image-2.6.26-1-amd64 depends on:
ii debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii initramfs-tools [linux-initra 0.92o tools for generating an initramfs
ii module-init-tools 3.4-1 tools for managing Linux kernel mo
linux-image-2.6.26-1-amd64 recommends no packages.
Versions of packages linux-image-2.6.26-1-amd64 suggests:
ii grub 0.97-47lenny2 GRand Unified Bootloader (Legacy v
pn linux-doc-2.6.26 <none> (no description available)
-- debconf information:
linux-image-2.6.26-1-amd64/postinst/create-kimage-link-2.6.26-1-amd64: true
shared/kernel-image/really-run-bootloader: true
linux-image-2.6.26-1-amd64/postinst/kimage-is-a-directory:
linux-image-2.6.26-1-amd64/preinst/bootloader-initrd-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/old-initrd-link-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/preinst/initrd-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/postinst/old-system-map-link-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/depmod-error-initrd-2.6.26-1-amd64: false
linux-image-2.6.26-1-amd64/preinst/overwriting-modules-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/preinst/elilo-initrd-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/bootloader-error-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/preinst/abort-install-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/preinst/lilo-initrd-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/depmod-error-2.6.26-1-amd64: false
linux-image-2.6.26-1-amd64/prerm/removing-running-kernel-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/prerm/would-invalidate-boot-loader-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/postinst/bootloader-test-error-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/preinst/abort-overwrite-2.6.26-1-amd64:
linux-image-2.6.26-1-amd64/postinst/old-dir-initrd-link-2.6.26-1-amd64: true
linux-image-2.6.26-1-amd64/preinst/lilo-has-ramdisk:
linux-image-2.6.26-1-amd64/preinst/failed-to-move-modules-2.6.26-1-amd64:
--- End Message ---
--- Begin Message ---
- To: 517449-done@bugs.debian.org
- Subject: Re: linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds)
- From: maximilian attems <max@stro.at>
- Date: Fri, 12 Mar 2010 00:25:21 +0100
- Message-id: <20100311232521.GA3568@baikonur.stro.at>
Version: 2.6.26-21
this should have been fixed on stable update, thus closing.
thanks for report.
--- End Message ---
Reply to: