[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#517449: linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds)



Package: linux-image-2.6.26-1-amd64
Version: 2.6.26-13
Severity: important
Tags: patch

* ISSUE
Lenny's kernel is subject to the bug described here:
http://lkml.org/lkml/2009/1/11/70

* ANALYSIS & FIX
and fixed with this thread:
http://lkml.org/lkml/2009/1/15/107

(in particular with http://lkml.org/lkml/2009/1/15/231 and http://lkml.org/lkml/2009/1/15/240, AFAIU)

FWIW, this seems to have made it to 2.6.28.y at least, with:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=046e7f77d734778a3b2e7d51ce63da3dbe7a8168
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=df94b12439ca449a852e579fc2952dac80f70c90

* TYPICAL SYMPTOMS
Basically, running tasks at SCHED_IDLEPRIO (such as BOINC) renders the system sluggish and randomly unresponsive.

Messages such as this one appear in dmesg:
[1830473.188790] INFO: task pdflush:3945 blocked for more than 120 seconds.
[1830473.269257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1830473.365255] pdflush       D ffff81003c595800     0  3945      2
[1830473.365274]  ffff81002f055d60 0000000000000046 ffff81002f055d70 ffffffff804289af
[1830473.365278]  ffff8100258eaee0 ffff81003e966f60 ffff8100258eb168 000000028031dc67
[1830473.365281]  0000000000000008 ffffffffa03445b4 ffff81000a2e8340 000000000000000b
[1830473.365283] Call Trace:
[1830473.365380]  [<ffffffff804289af>] thread_return+0x6b/0xac
[1830473.365448]  [<ffffffffa03445b4>] :xfs:xfs_log_move_tail+0x46/0x12c
[1830473.365472]  [<ffffffffa035b055>] :xfs:xfs_buf_wait_unpin+0x86/0xa8
[1830473.365479]  [<ffffffff8022c202>] default_wake_function+0x0/0xe
[1830473.365503]  [<ffffffffa035b49b>] :xfs:xfs_buf_iorequest+0x20/0x61
[1830473.365538]  [<ffffffffa035ec2e>] :xfs:xfs_bdstrat_cb+0x36/0x3a
[1830473.365559]  [<ffffffffa0357d59>] :xfs:xfs_bwrite+0x5e/0xbb
[1830473.365580]  [<ffffffffa0352209>] :xfs:xfs_syncsub+0x119/0x226
[1830473.365602]  [<ffffffffa03600d4>] :xfs:xfs_fs_write_super+0x1b/0x21
[1830473.365608]  [<ffffffff8029cd90>] sync_supers+0x60/0xa4
[1830473.365615]  [<ffffffff802783f2>] pdflush+0x0/0x211
[1830473.365619]  [<ffffffff80277fb9>] wb_kupdate+0x2d/0x10d
[1830473.369036]  [<ffffffff802783f2>] pdflush+0x0/0x211
[1830473.369036]  [<ffffffff80278556>] pdflush+0x164/0x211
[1830473.369036]  [<ffffffff80277f8c>] wb_kupdate+0x0/0x10d
[1830473.369036]  [<ffffffff80246083>] kthread+0x47/0x74
[1830473.369036]  [<ffffffff80230196>] schedule_tail+0x27/0x5c
[1830473.369036]  [<ffffffff8020cf28>] child_rip+0xa/0x12
[1830473.369036]  [<ffffffff80213299>] restore_i387_ia32+0xb0/0xd4
[1830473.369036]  [<ffffffff8024603c>] kthread+0x0/0x74
[1830473.369036]  [<ffffffff8020cf1e>] child_rip+0x0/0x12

Sometimes keyboard input will yield repeated keystrokes. SSH session will stop echoing. And basically hell freezes over 
for 2 minutes.

I believe this bug relates to #498328, #499046 and possibly #499198

This is an extremely nasty bug. I've seen it very frequently while running BOINC on Xen dom0 on a 16-core box (using 
debian xen kernel). A temporary workaround has been to cap BOINC to 90% CPU usage: freezes still happen but last less.

HTH

-- Package-specific info:
** Version:
Linux version 2.6.26-1-amd64 (Debian 2.6.26-13) (waldi@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-24)) #1 SMP Sat Jan 10 17:57:00 UTC 2009


-- System Information:
Debian Release: 5.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26-1-amd64 depends on:
ii  debconf [debconf-2.0]         1.5.24     Debian configuration management sy
ii  initramfs-tools [linux-initra 0.92o      tools for generating an initramfs
ii  module-init-tools             3.4-1      tools for managing Linux kernel mo

linux-image-2.6.26-1-amd64 recommends no packages.

Versions of packages linux-image-2.6.26-1-amd64 suggests:
ii  grub                       0.97-47lenny2 GRand Unified Bootloader (Legacy v
pn  linux-doc-2.6.26           <none>        (no description available)

-- debconf information:
  linux-image-2.6.26-1-amd64/postinst/create-kimage-link-2.6.26-1-amd64: true
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.26-1-amd64/postinst/kimage-is-a-directory:
  linux-image-2.6.26-1-amd64/preinst/bootloader-initrd-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/old-initrd-link-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/preinst/initrd-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/postinst/old-system-map-link-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/depmod-error-initrd-2.6.26-1-amd64: false
  linux-image-2.6.26-1-amd64/preinst/overwriting-modules-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/preinst/elilo-initrd-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/bootloader-error-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/preinst/abort-install-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/preinst/lilo-initrd-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/depmod-error-2.6.26-1-amd64: false
  linux-image-2.6.26-1-amd64/prerm/removing-running-kernel-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/prerm/would-invalidate-boot-loader-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/postinst/bootloader-test-error-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/preinst/abort-overwrite-2.6.26-1-amd64:
  linux-image-2.6.26-1-amd64/postinst/old-dir-initrd-link-2.6.26-1-amd64: true
  linux-image-2.6.26-1-amd64/preinst/lilo-has-ramdisk:
  linux-image-2.6.26-1-amd64/preinst/failed-to-move-modules-2.6.26-1-amd64:



Reply to: