[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#517449: marked as done (linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds))



Your message dated Fri, 12 Mar 2010 00:25:21 +0100
with message-id <20100311232521.GA3568@baikonur.stro.at>
and subject line Re: linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds)
has caused the Debian Bug report #517449,
regarding linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
517449: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=517449
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-2.6.26-1-amd64
Version: 2.6.26-13
Severity: important
Tags: patch

* ISSUE
Lenny's kernel is subject to the bug described here:
http://lkml.org/lkml/2009/1/11/70

* ANALYSIS & FIX
and fixed with this thread:
http://lkml.org/lkml/2009/1/15/107

(in particular with http://lkml.org/lkml/2009/1/15/231 and http://lkml.org/lkml/2009/1/15/240, AFAIU)

FWIW, this seems to have made it to 2.6.28.y at least, with:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=046e7f77d734778a3b2e7d51ce63da3dbe7a8168
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=df94b12439ca449a852e579fc2952dac80f70c90

* TYPICAL SYMPTOMS
Basically, running tasks at SCHED_IDLEPRIO (such as BOINC) renders the system sluggish and randomly unresponsive.

Messages such as this one appear in dmesg:
[1830473.188790] INFO: task pdflush:3945 blocked for more than 120 seconds.
[1830473.269257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1830473.365255] pdflush       D ffff81003c595800     0  3945      2
[1830473.365274]  ffff81002f055d60 0000000000000046 ffff81002f055d70 ffffffff804289af
[1830473.365278]  ffff8100258eaee0 ffff81003e966f60 ffff8100258eb168 000000028031dc67
[1830473.365281]  0000000000000008 ffffffffa03445b4 ffff81000a2e8340 000000000000000b
[1830473.365283] Call Trace:
[1830473.365380]  [<ffffffff804289af>] thread_return+0x6b/0xac
[1830473.365448]  [<ffffffffa03445b4>] :xfs:xfs_log_move_tail+0x46/0x12c
[1830473.365472]  [<ffffffffa035b055>] :xfs:xfs_buf_wait_unpin+0x86/0xa8
[1830473.365479]  [<ffffffff8022c202>] default_wake_function+0x0/0xe
[1830473.365503]  [<ffffffffa035b49b>] :xfs:xfs_buf_iorequest+0x20/0x61
[1830473.365538]  [<ffffffffa035ec2e>] :xfs:xfs_bdstrat_cb+0x36/0x3a
[1830473.365559]  [<ffffffffa0357d59>] :xfs:xfs_bwrite+0x5e/0xbb
[1830473.365580]  [<ffffffffa0352209>] :xfs:xfs_syncsub+0x119/0x226
[1830473.365602]  [<ffffffffa03600d4>] :xfs:xfs_fs_write_super+0x1b/0x21
[1830473.365608]  [<ffffffff8029cd90>] sync_supers+0x60/0xa4
[1830473.365615]  [<ffffffff802783f2>] pdflush+0x0/0x211
[1830473.365619]  [<ffffffff80277fb9>] wb_kupdate+0x2d/0x10d
[1830473.369036]  [<ffffffff802783f2>] pdflush+0x0/0x211
[1830473.369036]  [<ffffffff80278556>] pdflush+0x164/0x211
[1830473.369036]  [<ffffffff80277f8c>] wb_kupdate+0x0/0x10d
[1830473.369036]  [<ffffffff80246083>] kthread+0x47/0x74
[1830473.369036]  [<ffffffff80230196>] schedule_tail+0x27/0x5c
[1830473.369036]  [<ffffffff8020cf28>] child_rip+0xa/0x12
[1830473.369036]  [<ffffffff80213299>] restore_i387_ia32+0xb0/0xd4
[1830473.369036]  [<ffffffff8024603c>] kthread+0x0/0x74
[1830473.369036]  [<ffffffff8020cf1e>] child_rip+0x0/0x12

Sometimes keyboard input will yield repeated keystrokes. SSH session will stop echoing. And basically hell freezes over 
for 2 minutes.

I believe this bug relates to #498328, #499046 and possibly #499198

This is an extremely nasty bug. I've seen it very frequently while running BOINC on Xen dom0 on a 16-core box (using 
debian xen kernel). A temporary workaround has been to cap BOINC to 90% CPU usage: freezes still happen but last less.

HTH

-- Package-specific info:
** Version:
Linux version 2.6.26-1-amd64 (Debian 2.6.26-13) (waldi@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-24)) #1 SMP Sat Jan 10 17:57:00 UTC 2009


-- System Information:
Debian Release: 5.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26-1-amd64 depends on:
ii  debconf [debconf-2.0]         1.5.24     Debian configuration management sy
ii  initramfs-tools [linux-initra 0.92o      tools for generating an initramfs
ii  module-init-tools             3.4-1      tools for managing Linux kernel mo

linux-image-2.6.26-1-amd64 recommends no packages.

Versions of packages linux-image-2.6.26-1-amd64 suggests:
ii  grub                       0.97-47lenny2 GRand Unified Bootloader (Legacy v
pn  linux-doc-2.6.26           <none>        (no description available)

-- debconf information:
  linux-image-2.6.26-1-amd64/postinst/create-kimage-link-2.6.26-1-amd64: true
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.26-1-amd64/postinst/kimage-is-a-directory:
  linux-image-2.6.26-1-amd64/preinst/bootloader-initrd-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/old-initrd-link-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/preinst/initrd-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/postinst/old-system-map-link-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/depmod-error-initrd-2.6.26-1-amd64: false
  linux-image-2.6.26-1-amd64/preinst/overwriting-modules-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/preinst/elilo-initrd-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/bootloader-error-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/preinst/abort-install-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/preinst/lilo-initrd-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/depmod-error-2.6.26-1-amd64: false
  linux-image-2.6.26-1-amd64/prerm/removing-running-kernel-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/prerm/would-invalidate-boot-loader-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/bootloader-test-error-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/preinst/abort-overwrite-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/postinst/old-dir-initrd-link-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/preinst/lilo-has-ramdisk:
  linux-image-2.6.26-1-amd64/preinst/failed-to-move-modules-2.6.26-1-amd64:



--- End Message ---
--- Begin Message ---
Version: 2.6.26-21

this should have been fixed on stable update, thus closing.



thanks for report.



--- End Message ---

Reply to: