[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#516374: New workloads trigger 'INFO: task * blocked for more than 120 seconds.'



We're still seeing this, but with different workloads than before (i.e., the
workload which was affected in our earlier reports was fixed by
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=54;bug=517449, but there
are other workloads under which we're now observing this).

The common thread seems to be a single task suddenly performing lots of I/O
on an otherwise very lightly loaded machine. updatedb.mlocate has triggered
this a few times; other times it's been a single KVM virtual machine on an
otherwise idle physical machine. Sometimes it's that particular task that
blocks and triggers this warning; other times it's unrelated tasks (sshd,
for example).

This seems to happen most readily with NFS I/O when network latency is
relatively high (we have a few cross-country NFS mounts that aren't the
fastest thing on four wheels), but we've also seen this with predominantly
local disk I/O. FWIW, this is with kernels that have the performance
regression in the RPC auth code (#524199) fixed.

The machines don't seem to lock up as was the case before, but the affected
tasks are obviously unresponsive.

john

[697382.442558] INFO: task sshd:27512 blocked for more than 120 seconds.
[697382.442558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[697382.442558] sshd          D ffff810001031250     0 27512   3826
[697382.442558]  ffff8105bc8d9938 0000000000000086 0000000000000010 ffff81082ccdd000
[697382.442558]  ffff81082b13ead0 ffff8108299a2910 ffff81082b13ed58 0000000400000000
[697382.442558]  0f480000bc8d9928 ffffffff80295f4d 0000000000011200 0000000000000292
[697382.442558] Call Trace:
[697382.442558]  [<ffffffff80295f4d>] cache_alloc_refill+0x111/0x20c
[697382.442558]  [<ffffffffa02893ca>] :sunrpc:rpc_wait_bit_killable+0x0/0x31
[697382.442558]  [<ffffffffa02893f4>] :sunrpc:rpc_wait_bit_killable+0x2a/0x31
[697382.442558]  [<ffffffff80429332>] __wait_on_bit+0x40/0x6e
[697382.442558]  [<ffffffffa02893ca>] :sunrpc:rpc_wait_bit_killable+0x0/0x31
[697382.442558]  [<ffffffff804293cc>] out_of_line_wait_on_bit+0x6c/0x78
[697382.442558]  [<ffffffff8024622f>] wake_bit_function+0x0/0x23
[697382.442558]  [<ffffffffa02862e9>] :sunrpc:xprt_connect+0x89/0x123
[697382.442558]  [<ffffffffa028998f>] :sunrpc:__rpc_execute+0xe6/0x223
[697382.442558]  [<ffffffffa0283bcb>] :sunrpc:rpc_run_task+0x4f/0x56
[697382.442558]  [<ffffffffa0283c67>] :sunrpc:rpc_call_sync+0x3e/0x5b
[697382.442558]  [<ffffffffa02dbefe>] :nfs:nfs3_rpc_wrapper+0x19/0x50
[697382.442558]  [<ffffffffa02dc5cf>] :nfs:nfs3_proc_access+0x125/0x191
[697382.442558]  [<ffffffff8023cfeb>] try_to_del_timer_sync+0x51/0x5a
[697382.442558]  [<ffffffff8023d000>] del_timer_sync+0xc/0x16
[697382.442558]  [<ffffffff80246388>] remove_wait_queue+0x12/0x45
[697382.442558]  [<ffffffff802a6eeb>] free_poll_entry+0x11/0x1a
[697382.442558]  [<ffffffff802a6f1d>] poll_freewait+0x29/0x6a
[697382.442558]  [<ffffffffa02cd1e8>] :nfs:nfs_do_access+0x163/0x30c
[697382.442558]  [<ffffffffa02cd481>] :nfs:nfs_permission+0xf0/0x15f
[697382.442558]  [<ffffffff802a21f7>] permission+0xb5/0x118
[697382.442558]  [<ffffffff802a377f>] __link_path_walk+0x150/0xd05
[697382.442558]  [<ffffffff8021a4ee>] apic_wait_icr_idle+0xe/0x15
[697382.442558]  [<ffffffff8021a250>] native_flush_tlb_others+0x5d/0x83
[697382.442558]  [<ffffffff802a437a>] path_walk+0x46/0x8b
[697382.442558]  [<ffffffff802a46a6>] do_path_lookup+0x158/0x1cf
[697382.442558]  [<ffffffff802a51ea>] __path_lookup_intent_open+0x56/0x96
[697382.442558]  [<ffffffff802a531f>] do_filp_open+0x9c/0x7c4
[697382.442558]  [<ffffffff8023e186>] lock_task_sighand+0x2c/0x52
[697382.442558]  [<ffffffff80221fbc>] do_page_fault+0x5d8/0x9c8
[697382.442558]  [<ffffffff802995a0>] get_unused_fd_flags+0x71/0x115
[697382.442558]  [<ffffffff8029968a>] do_sys_open+0x46/0xc3
[697382.442558]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f

[1726155.948043] INFO: task bzip2:456 blocked for more than 120 seconds.
[1726155.949118] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1726155.950281] bzip2         D ffff8100010190f0     0   456    450
[1726155.950285]  ffff810008209c78 0000000000000046 ffff8101001334c8 ffffffffa01d2b18
[1726155.950289]  ffff81011fa90080 ffff81011fba5120 ffff81011fa90308 00000000001334c8
[1726155.950291]  ffff81011d1a4800 ffffffffa01cecff 0000000000000001 0000000000000000
[1726155.950294] Call Trace:
[1726155.950363]  [<ffffffffa01d2b18>] :sunrpc:rpc_sleep_on+0x21/0x2a3
[1726155.950376]  [<ffffffffa01cecff>] :sunrpc:xprt_reserve+0x14b/0x15a
[1726155.950429]  [<ffffffffa021fa73>] :nfs:nfs_wait_bit_killable+0x0/0x30
[1726155.950440]  [<ffffffffa021fa9d>] :nfs:nfs_wait_bit_killable+0x2a/0x30
[1726155.950466]  [<ffffffff80429312>] __wait_on_bit+0x40/0x6e
[1726155.950477]  [<ffffffffa021fa73>] :nfs:nfs_wait_bit_killable+0x0/0x30
[1726155.950481]  [<ffffffff804293ac>] out_of_line_wait_on_bit+0x6c/0x78
[1726155.950495]  [<ffffffff80246273>] wake_bit_function+0x0/0x23
[1726155.950508]  [<ffffffffa0223565>] :nfs:nfs_sync_mapping_wait+0xed/0x298
[1726155.950521]  [<ffffffffa02238f2>] :nfs:__nfs_write_mapping+0x2c/0x4f
[1726155.950533]  [<ffffffffa0223958>] :nfs:nfs_write_mapping+0x43/0x65
[1726155.950544]  [<ffffffffa0218604>] :nfs:nfs_do_fsync+0x19/0x34
[1726155.950553]  [<ffffffffa0218a61>] :nfs:nfs_file_flush+0x84/0xac
[1726155.950568]  [<ffffffff80299549>] filp_close+0x37/0x65
[1726155.950583]  [<ffffffff80236776>] put_files_struct+0x66/0xc4
[1726155.950586]  [<ffffffff80237a03>] do_exit+0x274/0x6a6
[1726155.950603]  [<ffffffff8031e0ef>] __up_write+0x21/0x10e
[1726155.950607]  [<ffffffff80237ea2>] do_group_exit+0x6d/0x9d
[1726155.950609]  [<ffffffff80237ee4>] sys_exit_group+0x12/0x16
[1726155.950615]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f

[22652485.379503] INFO: task kvm:5135 blocked for more than 120 seconds.
[22652485.379503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[22652485.379503] kvm           D 0000000000000000     0  5135      1
[22652485.379503]  ffff81042401fbe8 0000000000000082 0000000000000000 ffff810008a652f0
[22652485.379503]  ffff81042bbdf7b0 ffffffff804f8480 ffff81042bbdfa38 00000000b6acf690
[22652485.379503]  0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[22652485.379503] Call Trace:
[22652485.379503]  [<ffffffff80271066>] sync_page+0x0/0x41
[22652485.379503]  [<ffffffff804284e7>] io_schedule+0x5c/0x9e
[22652485.379503]  [<ffffffff802710a2>] sync_page+0x3c/0x41
[22652485.379503]  [<ffffffff80428742>] __wait_on_bit+0x40/0x6e
[22652485.379503]  [<ffffffff802712ed>] wait_on_page_bit+0x6b/0x71
[22652485.379503]  [<ffffffff80246213>] wake_bit_function+0x0/0x23
[22652485.379503]  [<ffffffff80278dca>] pagevec_lookup_tag+0x1a/0x21
[22652485.379503]  [<ffffffff80271804>] wait_on_page_writeback_range+0x66/0x113
[22652485.379503]  [<ffffffff80277732>] do_writepages+0x27/0x2d
[22652485.379503]  [<ffffffff802b66ee>] generic_osync_inode+0x5e/0xcf
[22652485.379503]  [<ffffffff80272771>] generic_file_aio_write+0xa6/0xc1
[22652485.379503]  [<ffffffffa018025a>] :ext3:ext3_file_write+0x16/0x94
[22652485.379503]  [<ffffffff8029ac3b>] do_sync_write+0xc9/0x10c
[22652485.379503]  [<ffffffff802461e5>] autoremove_wake_function+0x0/0x2e
[22652485.379503]  [<ffffffffa02aa36e>] :kvm:kvm_vm_ioctl+0x19c/0x1b5
[22652485.379503]  [<ffffffff8023fc52>] sys_rt_sigtimedwait+0xf1/0x25f
[22652485.379503]  [<ffffffff8029b3e5>] vfs_write+0xad/0x156
[22652485.379503]  [<ffffffff8029b987>] sys_write+0x45/0x6e
[22652485.379503]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f

-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__



Reply to: