Bug#671153: machine stopped working in kernel task "bdi-default"
Package: linux-image-2.6.32-5-amd64
Version: 2.6.32-41squeeze2
Hi,
this morning this machine was not working anymore: it was responding to
ping and nothing else (no postgresql, no ssh, no samba, no ldaps, no
apache2). The machine was not powerd off. And since it is headless I
could not check anything on the console.
Once powercycled, I found this message in syslog:
May 2 07:10:01 /USR/SBIN/CRON[25505]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 2 07:10:29 kernel: [19241.580034] INFO: task bdi-default:25 blocked for more than 120 seconds.
May 2 07:10:29 kernel: [19241.580037] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 2 07:10:29 kernel: [19241.580039] bdi-default D 0000000000000000 0 25 2 0x00000000
May 2 07:10:29 kernel: [19241.580043] ffff880196c754c0 0000000000000046 0000000000000000 0000000000000008
May 2 07:10:29 kernel: [19241.580047] 0000000000015780 0000000000015780 000000000000f9e0 ffff880196e5ffd8
May 2 07:10:29 kernel: [19241.580050] 0000000000015780 0000000000015780 ffff880196ccc6a0 ffff880196ccc998
May 2 07:10:29 kernel: [19241.580054] Call Trace:
May 2 07:10:29 kernel: [19241.580062] [<ffffffff810414f5>] ? select_task_rq_fair+0x472/0x836
May 2 07:10:29 kernel: [19241.580067] [<ffffffff812fb53d>] ? schedule_timeout+0x2e/0xdd
May 2 07:10:29 kernel: [19241.580070] [<ffffffff812fb3f4>] ? wait_for_common+0xde/0x15b
May 2 07:10:29 kernel: [19241.580074] [<ffffffff8104a450>] ? default_wake_function+0x0/0x9
May 2 07:10:29 kernel: [19241.580078] [<ffffffff81064d8a>] ? kthread_create+0x93/0x121
May 2 07:10:29 kernel: [19241.580082] [<ffffffff810c8fde>] ? bdi_start_fn+0x0/0xd2
May 2 07:10:29 kernel: [19241.580088] [<ffffffff8105a854>] ? lock_timer_base+0x26/0x4b
May 2 07:10:29 kernel: [19241.580091] [<ffffffff8105a8dc>] ? try_to_del_timer_sync+0x63/0x6c
May 2 07:10:29 kernel: [19241.580094] [<ffffffff8105a8f1>] ? del_timer_sync+0xc/0x16
May 2 07:10:29 kernel: [19241.580096] [<ffffffff812fb5bc>] ? schedule_timeout+0xad/0xdd
May 2 07:10:29 kernel: [19241.580099] [<ffffffff8105a970>] ? process_timeout+0x0/0x5
May 2 07:10:29 kernel: [19241.580102] [<ffffffff810c8f16>] ? bdi_forker_task+0x1f5/0x2bd
May 2 07:10:29 kernel: [19241.580107] [<ffffffff8103aa76>] ? __wake_up_common+0x44/0x72
May 2 07:10:29 kernel: [19241.580110] [<ffffffff810c8d21>] ? bdi_forker_task+0x0/0x2bd
May 2 07:10:29 kernel: [19241.580112] [<ffffffff81064c4d>] ? kthread+0x79/0x81
May 2 07:10:29 kernel: [19241.580116] [<ffffffff81011baa>] ? child_rip+0xa/0x20
May 2 07:10:29 kernel: [19241.580119] [<ffffffff81064bd4>] ? kthread+0x0/0x81
May 2 07:10:29 kernel: [19241.580121] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
May 2 07:13:35 /USR/SBIN/CRON[25503]: (CRON) error (grandchild #25505 failed with exit status 1)
this is while the machine was a overloaded by a massive postgresql
import that start every morning at 7:02.
Thanks,
Giuseppe
Reply to: