[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#516374: INFO: task * blocked for more than 120 seconds.



Any idea on this?  I've been having similar problems on a variety of machines:

[682451.031376] INFO: task murmur.x86:2205 blocked for more than 120 seconds.
[696016.661609] INFO: task syslogd:1745 blocked for more than 120 seconds.
[696205.632817] INFO: task syslogd:1745 blocked for more than 120 seconds.
[696205.633008] INFO: task mysqld:10155 blocked for more than 120 seconds.
[709562.957134] INFO: task kjournald:567 blocked for more than 120 seconds.
[709582.295671] INFO: task syslogd:1745 blocked for more than 120 seconds.
[709612.363511] INFO: task mysqld:22259 blocked for more than 120 seconds.
[709792.771989] INFO: task syslogd:1745 blocked for more than 120 seconds.
[709792.772163] INFO: task mysqld:22259 blocked for more than 120 seconds.
[1469866.661780] INFO: task syslogd:1745 blocked for more than 120 seconds.
[1551108.121338] INFO: task syslogd:1745 blocked for more than 120 seconds.
[1564811.207888] INFO: task syslogd:1745 blocked for more than 120 seconds.
[1564811.208071] INFO: task mysqld:2377 blocked for more than 120 seconds.
[1578213.680390] INFO: task mysqld:3114 blocked for more than 120 seconds.

Such as:

[1578213.680390] INFO: task mysqld:3114 blocked for more than 120 seconds.
[1578213.680407] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1578213.680415] mysqld        D 0000000000000800     0  3114   3065
[1578213.680422]  ffff880010ec3da8 0000000000000286 ffff88002bde4a40
ffffffff804ff460
[1578213.680433]  ffff88002bde4a40 ffff88002a174c00 ffff88002bde4cc0
0000000000000000
[1578213.680442]  0000dada00000005 00000000000db265 ffffffff805cb280
ffffffff0000000f
[1578213.680449] Call Trace:
[1578213.680467]  [<ffffffff80223901>] __wake_up+0x38/0x4f
[1578213.680481]  [<ffffffffa0010067>] :jbd:log_wait_commit+0xb6/0x11f
[1578213.680491]  [<ffffffff8023f6c1>] autoremove_wake_function+0x0/0x2e
[1578213.680501]  [<ffffffffa000b552>] :jbd:journal_stop+0x198/0x1f3
[1578213.680510]  [<ffffffff802a7d98>] __writeback_single_inode+0x1bc/0x2da
[1578213.680518]  [<ffffffff802a8839>] sync_inode+0x24/0x53
[1578213.680533]  [<ffffffffa001c48a>] :ext3:ext3_sync_file+0x9e/0xb0
[1578213.680542]  [<ffffffff802aae72>] do_fsync+0x52/0xa4
[1578213.680548]  [<ffffffff802aaee7>] __do_fsync+0x23/0x36
[1578213.680554]  [<ffffffff8020b528>] system_call+0x68/0x6d
[1578213.680560]  [<ffffffff8020b4c0>] system_call+0x0/0x6d

I've had these problems with the 2.6.26 Lenny kernel and 2.6.30 from
backports.  These machines are fairly lightly used, but end up
freezing occasionally, or in the worst case end up eating 100% CPU for
several hours and require restarting.  I've also tried 2.6.32 from
sid, it also ended up with a task blocked within a day, so didn't
bother let it running for longer to see if it crashed and burned -
it's a production server.

All the problems I've had are on virtual machines running on Xen hosts
with more than 1 CPU core.  CentOS 5.4 host server running Xen 3.0
(plus plenty of additional RedHat patches probably), as well as a
Linode server which is using Xen 3.4.1.  Virtual machines running
Lenny with just 1 core have no problems at all, however putting a
Lenny virtual machine on a Xen host with access to more than 1 cpu
core is almost guaranteed to end up with problems eventually.

I'm currently running the Etch kernel (2.6.18 xen), which currently
seems to run okay - although I haven't had time to fully test it.
Even if this particular kernel works, it obviously isn't a solution as
support for Etch is eventually going to run out.

Is there any solution or idea behind these problems?



Reply to: