Bug#855956: linux-image-4.9.0-0.bpo.1-amd64-unsigned: kworker blocked for more than 120 seconds.
Package: linux-image-4.9.0-0.bpo.1-amd64-unsigned
Version: 4.9.2-2~bpo8+1
Severity: important
Dear Debian folks,
Apologies for the duplication, sending to the main submission list.
After upgrading 3 Debian Jessie servers to kernel 4.9.2 from
jessie-backports I encountered multiple
kernel traces on 3 separate servers at random times after boot. As a
result, one of the kworker processes
and several user space pids are stuck in 'D' state and encounter
issues accessing the XFS file system.
A proper shutdown will not complete and a reset is required when this
occurs. In one case this caused
file system corruption and prevented the server from booting normally
(dropped into a generic grub shell).
No specific workflow to trigger these events (so far).
The systems involved are:
System Information
Manufacturer: Supermicro
Product Name: X9DRT
System Information
Manufacturer: Supermicro
Product Name: X8DTT-H
The example traces are as follows:
[812709.892923] INFO: task kworker/u49:0:13127 blocked for more than
120 seconds.
[812709.892989] Tainted: G E 4.9.0-0.bpo.1-amd64 #1
[812709.893031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[812709.893085] kworker/u49:0 D 0 13127 2 0x00000000
[812709.893100] Workqueue: writeback wb_workfn (flush-253:0)
[812709.893104] ffff8f8d1a76d800 0000000000000000 ffff8f8d1bda3140
ffff8f8aa3e550c0
[812709.893108] ffff8f8d1fb587c0 ffffb6820c9477a0 ffffffff81ff536d
00000001c053ad80
[812709.893112] 0000000000000000 0000000018af7798 ffffffff81abb2ee
ffff8f8aa3e550c0
[812709.893116] Call Trace:
[812709.893125] [<ffffffff81ff536d>] ? __schedule+0x23d/0x6d0
[812709.893130] [<ffffffff81abb2ee>] ? __wake_up_common+0x4e/0x90
[812709.893133] [<ffffffff81ff60a0>] ? bit_wait_timeout+0x90/0x90
[812709.893135] [<ffffffff81ff5832>] ? schedule+0x32/0x80
[812709.893138] [<ffffffff81ff8d3c>] ? schedule_timeout+0x21c/0x3c0
[812709.893145] [<ffffffff81cfe860>] ? blk_flush_plug_list+0xa0/0x220
[812709.893148] [<ffffffff81cfe860>] ? blk_flush_plug_list+0xa0/0x220
[812709.893151] [<ffffffff81ff60a0>] ? bit_wait_timeout+0x90/0x90
[812709.893153] [<ffffffff81ff50b4>] ? io_schedule_timeout+0xb4/0x130
[812709.893156] [<ffffffff81abb5a7>] ? prepare_to_wait_exclusive+0x57/0x80
[812709.893158] [<ffffffff81ff60b7>] ? bit_wait_io+0x17/0x60
[812709.893161] [<ffffffff81ff5c5f>] ? __wait_on_bit_lock+0x7f/0xb0
[812709.893163] [<ffffffff81ff60a0>] ? bit_wait_timeout+0x90/0x90
[812709.893166] [<ffffffff81ff5e6e>] ? out_of_line_wait_on_bit_lock+0x7e/0xa0
[812709.893169] [<ffffffff81abb810>] ? autoremove_wake_function+0x40/0x40
[812709.893238] [<ffffffffc054c6dd>] ? xfs_do_writepage+0x44d/0x700 [xfs]
[812709.893244] [<ffffffff81bc3cee>] ? page_mkclean+0x6e/0xc0
[812709.893247] [<ffffffff81bc2070>] ? __page_check_address+0x1b0/0x1b0
[812709.893252] [<ffffffff81b8c997>] ? write_cache_pages+0x207/0x480
[812709.893299] [<ffffffffc054c290>] ? xfs_aops_discard_page+0x130/0x130 [xfs]
[812709.893310] [<ffffffffc0397fa6>] ? dm_make_request+0x76/0xc0 [dm_mod]
[812709.893356] [<ffffffffc054caba>] ? xfs_vm_writepages+0xba/0xf0 [xfs]
[812709.893360] [<ffffffff81c3232d>] ? __writeback_single_inode+0x3d/0x330
[812709.893363] [<ffffffff81c32aed>] ? writeback_sb_inodes+0x23d/0x470
[812709.893367] [<ffffffff81c32da7>] ? __writeback_inodes_wb+0x87/0xb0
[812709.893371] [<ffffffff81c33122>] ? wb_writeback+0x282/0x310
[812709.893374] [<ffffffff81c339f4>] ? wb_workfn+0x214/0x3e0
[812709.893378] [<ffffffff81a9172b>] ? process_one_work+0x14b/0x410
[812709.893381] [<ffffffff81a921e5>] ? worker_thread+0x65/0x4a0
[812709.893383] [<ffffffff81a92180>] ? rescuer_thread+0x340/0x340
[812709.893386] [<ffffffff81a92180>] ? rescuer_thread+0x340/0x340
[812709.893390] [<ffffffff81a7c689>] ? do_group_exit+0x39/0xb0
[812709.893393] [<ffffffff81a974e0>] ? kthread+0xe0/0x100
[812709.893398] [<ffffffff81a2476b>] ? __switch_to+0x2bb/0x700
[812709.893401] [<ffffffff81a97400>] ? kthread_park+0x60/0x60
[812709.893405] [<ffffffff81ffa435>] ? ret_from_fork+0x25/0x30
Server2:
[1004725.175725] INFO: task kworker/u49:2:20785 blocked for more than
120 seconds.
[1004725.175790] Not tainted 4.9.0-0.bpo.1-amd64 #1
[1004725.175818] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1004725.175859] kworker/u49:2 D 0 20785 2 0x00000000
[1004725.175875] Workqueue: writeback wb_workfn (flush-253:0)
[1004725.175879] ffff92d41a0efc00 0000000000000000 ffff92cc1bdb3080
ffff92cbcfb5b0c0
[1004725.175882] ffff92cc1fc187c0 ffffa85ae2aab7a0 ffffffffaa3f536d
00000001c0326d80
[1004725.175885] 0000000000000000 000000001b30ca18 ffffffffa9ebb2ee
ffff92cbcfb5b0c0
[1004725.175888] Call Trace:
[1004725.175898] [<ffffffffaa3f536d>] ? __schedule+0x23d/0x6d0
[1004725.175904] [<ffffffffa9ebb2ee>] ? __wake_up_common+0x4e/0x90
[1004725.175906] [<ffffffffaa3f60a0>] ? bit_wait_timeout+0x90/0x90
[1004725.175908] [<ffffffffaa3f5832>] ? schedule+0x32/0x80
[1004725.175910] [<ffffffffaa3f8d3c>] ? schedule_timeout+0x21c/0x3c0
[1004725.175918] [<ffffffffaa0fe860>] ? blk_flush_plug_list+0xa0/0x220
[1004725.175921] [<ffffffffaa0fe860>] ? blk_flush_plug_list+0xa0/0x220
[1004725.175923] [<ffffffffaa3f60a0>] ? bit_wait_timeout+0x90/0x90
[1004725.175925] [<ffffffffaa3f50b4>] ? io_schedule_timeout+0xb4/0x130
[1004725.175927] [<ffffffffa9ebb5a7>] ? prepare_to_wait_exclusive+0x57/0x80
[1004725.175928] [<ffffffffaa3f60b7>] ? bit_wait_io+0x17/0x60
[1004725.175930] [<ffffffffaa3f5c5f>] ? __wait_on_bit_lock+0x7f/0xb0
[1004725.175932] [<ffffffffaa3f60a0>] ? bit_wait_timeout+0x90/0x90
[1004725.175934] [<ffffffffaa3f5e6e>] ? out_of_line_wait_on_bit_lock+0x7e/0xa0
[1004725.175936] [<ffffffffa9ebb810>] ? autoremove_wake_function+0x40/0x40
[1004725.176006] [<ffffffffc03386dd>] ? xfs_do_writepage+0x44d/0x700 [xfs]
[1004725.176011] [<ffffffffa9fc3cee>] ? page_mkclean+0x6e/0xc0
[1004725.176014] [<ffffffffa9fc2070>] ? __page_check_address+0x1b0/0x1b0
[1004725.176018] [<ffffffffa9f8c997>] ? write_cache_pages+0x207/0x480
[1004725.176054] [<ffffffffc0338290>] ? xfs_aops_discard_page+0x130/0x130 [xfs]
[1004725.176063] [<ffffffffc02c3fa6>] ? dm_make_request+0x76/0xc0 [dm_mod]
[1004725.176097] [<ffffffffc0338aba>] ? xfs_vm_writepages+0xba/0xf0 [xfs]
[1004725.176101] [<ffffffffaa03232d>] ? __writeback_single_inode+0x3d/0x330
[1004725.176105] [<ffffffffa9eee906>] ? ktime_get+0x36/0xa0
[1004725.176108] [<ffffffffaa032aed>] ? writeback_sb_inodes+0x23d/0x470
[1004725.176111] [<ffffffffaa032da7>] ? __writeback_inodes_wb+0x87/0xb0
[1004725.176113] [<ffffffffaa033122>] ? wb_writeback+0x282/0x310
[1004725.176116] [<ffffffffaa033a98>] ? wb_workfn+0x2b8/0x3e0
[1004725.176121] [<ffffffffa9e9172b>] ? process_one_work+0x14b/0x410
[1004725.176123] [<ffffffffa9e921e5>] ? worker_thread+0x65/0x4a0
[1004725.176125] [<ffffffffa9e92180>] ? rescuer_thread+0x340/0x340
[1004725.176129] [<ffffffffa9e7c689>] ? do_group_exit+0x39/0xb0
[1004725.176132] [<ffffffffa9e974e0>] ? kthread+0xe0/0x100
[1004725.176138] [<ffffffffa9e2476b>] ? __switch_to+0x2bb/0x700
[1004725.176141] [<ffffffffa9e97400>] ? kthread_park+0x60/0x60
[1004725.176144] [<ffffffffaa3fa435>] ? ret_from_fork+0x25/0x30
Server3:
[178825.463856] INFO: task kworker/u49:2:18558 blocked for more than
120 seconds.
[178825.463887] Not tainted 4.9.0-0.bpo.1-amd64 #1
[178825.463905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[178825.463932] kworker/u49:2 D 0 18558 2 0x00000000
[178825.463962] Workqueue: writeback wb_workfn (flush-253:0)
[178825.463984] ffff9ab95bb47000 0000000000000000 ffff9ab95bdb80c0
ffff9ab8ac259040
[178825.464014] ffff9ab95fc587c0 ffffb9e2877df7a0 ffffffff93bf536d
00000001c0318d80
[178825.464043] 0000000000000000 000000005bb3a198 ffffffff936bb2ee
ffff9ab8ac259040
[178825.464093] Call Trace:
[178825.464122] [<ffffffff93bf536d>] ? __schedule+0x23d/0x6d0
[178825.464165] [<ffffffff936bb2ee>] ? __wake_up_common+0x4e/0x90
[178825.464209] [<ffffffff93bf60a0>] ? bit_wait_timeout+0x90/0x90
[178825.464232] [<ffffffff93bf5832>] ? schedule+0x32/0x80
[178825.464265] [<ffffffff93bf8d3c>] ? schedule_timeout+0x21c/0x3c0
[178825.464289] [<ffffffff938fe860>] ? blk_flush_plug_list+0xa0/0x220
[178825.464311] [<ffffffff938fe860>] ? blk_flush_plug_list+0xa0/0x220
[178825.464333] [<ffffffff93bf60a0>] ? bit_wait_timeout+0x90/0x90
[178825.464354] [<ffffffff93bf50b4>] ? io_schedule_timeout+0xb4/0x130
[178825.464376] [<ffffffff936bb5a7>] ? prepare_to_wait_exclusive+0x57/0x80
[178825.464399] [<ffffffff93bf60b7>] ? bit_wait_io+0x17/0x60
[178825.464418] [<ffffffff93bf5c5f>] ? __wait_on_bit_lock+0x7f/0xb0
[178825.464439] [<ffffffff93bf60a0>] ? bit_wait_timeout+0x90/0x90
[178825.464460] [<ffffffff93bf5e6e>] ? out_of_line_wait_on_bit_lock+0x7e/0xa0
[178825.464484] [<ffffffff936bb810>] ? autoremove_wake_function+0x40/0x40
[178825.464563] [<ffffffffc032a6dd>] ? xfs_do_writepage+0x44d/0x700 [xfs]
[178825.464588] [<ffffffff937c3cee>] ? page_mkclean+0x6e/0xc0
[178825.464608] [<ffffffff937c2070>] ? __page_check_address+0x1b0/0x1b0
[178825.464645] [<ffffffff9378c997>] ? write_cache_pages+0x207/0x480
[178825.464705] [<ffffffffc032a290>] ? xfs_aops_discard_page+0x130/0x130 [xfs]
[178825.464762] [<ffffffffc01bcfa6>] ? dm_make_request+0x76/0xc0 [dm_mod]
[178825.464851] [<ffffffffc032aaba>] ? xfs_vm_writepages+0xba/0xf0 [xfs]
[178825.465677] [<ffffffff9383232d>] ? __writeback_single_inode+0x3d/0x330
[178825.466440] [<ffffffff9392a737>] ?
fprop_reflect_period_percpu.isra.5+0x77/0xb0
[178825.467228] [<ffffffff93832aed>] ? writeback_sb_inodes+0x23d/0x470
[178825.467992] [<ffffffff93832da7>] ? __writeback_inodes_wb+0x87/0xb0
[178825.468857] [<ffffffff93833122>] ? wb_writeback+0x282/0x310
[178825.469565] [<ffffffff93833a98>] ? wb_workfn+0x2b8/0x3e0
[178825.470307] [<ffffffff9369172b>] ? process_one_work+0x14b/0x410
[178825.471075] [<ffffffff936921e5>] ? worker_thread+0x65/0x4a0
[178825.471765] [<ffffffff93692180>] ? rescuer_thread+0x340/0x340
[178825.472504] [<ffffffff9367c689>] ? do_group_exit+0x39/0xb0
[178825.473216] [<ffffffff936974e0>] ? kthread+0xe0/0x100
[178825.474007] [<ffffffff9362476b>] ? __switch_to+0x2bb/0x700
[178825.474664] [<ffffffff93697400>] ? kthread_park+0x60/0x60
[178825.475434] [<ffffffff93bfa435>] ? ret_from_fork+0x25/0x30
# uname -a
Linux server 4.9.0-0.bpo.1-amd64 #1 SMP Debian 4.9.2-2~bpo8+1
(2017-01-26) x86_64 GNU/Linux
Please let me know if you need any additional data etc.
Thank you,
Erik
Reply to: