[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#671153: machine stopped working in kernel task "bdi-default"



Package: linux-image-2.6.32-5-amd64
Version: 2.6.32-41squeeze2

Hi,
this morning this machine was not working anymore: it was responding to
ping and nothing else (no postgresql, no ssh, no samba, no ldaps, no
apache2). The machine was not powerd off. And since it is headless I
could not check anything on the console.

Once powercycled, I found this message in syslog:
May  2 07:10:01 /USR/SBIN/CRON[25505]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May  2 07:10:29 kernel: [19241.580034] INFO: task bdi-default:25 blocked for more than 120 seconds.
May  2 07:10:29 kernel: [19241.580037] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  2 07:10:29 kernel: [19241.580039] bdi-default   D 0000000000000000     0    25      2 0x00000000
May  2 07:10:29 kernel: [19241.580043]  ffff880196c754c0 0000000000000046 0000000000000000 0000000000000008
May  2 07:10:29 kernel: [19241.580047]  0000000000015780 0000000000015780 000000000000f9e0 ffff880196e5ffd8
May  2 07:10:29 kernel: [19241.580050]  0000000000015780 0000000000015780 ffff880196ccc6a0 ffff880196ccc998
May  2 07:10:29 kernel: [19241.580054] Call Trace:
May  2 07:10:29 kernel: [19241.580062]  [<ffffffff810414f5>] ? select_task_rq_fair+0x472/0x836
May  2 07:10:29 kernel: [19241.580067]  [<ffffffff812fb53d>] ? schedule_timeout+0x2e/0xdd
May  2 07:10:29 kernel: [19241.580070]  [<ffffffff812fb3f4>] ? wait_for_common+0xde/0x15b
May  2 07:10:29 kernel: [19241.580074]  [<ffffffff8104a450>] ? default_wake_function+0x0/0x9
May  2 07:10:29 kernel: [19241.580078]  [<ffffffff81064d8a>] ? kthread_create+0x93/0x121
May  2 07:10:29 kernel: [19241.580082]  [<ffffffff810c8fde>] ? bdi_start_fn+0x0/0xd2
May  2 07:10:29 kernel: [19241.580088]  [<ffffffff8105a854>] ? lock_timer_base+0x26/0x4b
May  2 07:10:29 kernel: [19241.580091]  [<ffffffff8105a8dc>] ? try_to_del_timer_sync+0x63/0x6c
May  2 07:10:29 kernel: [19241.580094]  [<ffffffff8105a8f1>] ? del_timer_sync+0xc/0x16
May  2 07:10:29 kernel: [19241.580096]  [<ffffffff812fb5bc>] ? schedule_timeout+0xad/0xdd
May  2 07:10:29 kernel: [19241.580099]  [<ffffffff8105a970>] ? process_timeout+0x0/0x5
May  2 07:10:29 kernel: [19241.580102]  [<ffffffff810c8f16>] ? bdi_forker_task+0x1f5/0x2bd
May  2 07:10:29 kernel: [19241.580107]  [<ffffffff8103aa76>] ? __wake_up_common+0x44/0x72
May  2 07:10:29 kernel: [19241.580110]  [<ffffffff810c8d21>] ? bdi_forker_task+0x0/0x2bd
May  2 07:10:29 kernel: [19241.580112]  [<ffffffff81064c4d>] ? kthread+0x79/0x81
May  2 07:10:29 kernel: [19241.580116]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
May  2 07:10:29 kernel: [19241.580119]  [<ffffffff81064bd4>] ? kthread+0x0/0x81
May  2 07:10:29 kernel: [19241.580121]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
May  2 07:13:35 /USR/SBIN/CRON[25503]: (CRON) error (grandchild #25505 failed with exit status 1)

this is while the machine was a overloaded by a massive postgresql
import that start every morning at 7:02.

Thanks,
Giuseppe




Reply to: