[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#855956: linux-image-4.9.0-0.bpo.1-amd64-unsigned: kworker blocked for more than 120 seconds.



Package: linux-image-4.9.0-0.bpo.1-amd64-unsigned
Version: 4.9.2-2~bpo8+1
Severity: important

Dear Debian folks,

Apologies for the duplication, sending to the main submission list.

After upgrading 3 Debian Jessie servers to kernel 4.9.2 from
jessie-backports I encountered multiple
kernel traces on 3 separate servers at random times after boot. As a
result, one of the kworker processes
and several user space pids are stuck in 'D' state and encounter
issues accessing the XFS file system.

A proper shutdown will not complete and a reset is required when this
occurs. In one case this caused
file system corruption and prevented the server from booting normally
(dropped into a generic grub shell).

No specific workflow to trigger these events (so far).

The systems involved are:

System Information
    Manufacturer: Supermicro
    Product Name: X9DRT

System Information
        Manufacturer: Supermicro
        Product Name: X8DTT-H

The example traces are as follows:

[812709.892923] INFO: task kworker/u49:0:13127 blocked for more than
120 seconds.
[812709.892989]       Tainted: G            E   4.9.0-0.bpo.1-amd64 #1
[812709.893031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[812709.893085] kworker/u49:0   D    0 13127      2 0x00000000
[812709.893100] Workqueue: writeback wb_workfn (flush-253:0)
[812709.893104]  ffff8f8d1a76d800 0000000000000000 ffff8f8d1bda3140
ffff8f8aa3e550c0
[812709.893108]  ffff8f8d1fb587c0 ffffb6820c9477a0 ffffffff81ff536d
00000001c053ad80
[812709.893112]  0000000000000000 0000000018af7798 ffffffff81abb2ee
ffff8f8aa3e550c0
[812709.893116] Call Trace:
[812709.893125]  [<ffffffff81ff536d>] ? __schedule+0x23d/0x6d0
[812709.893130]  [<ffffffff81abb2ee>] ? __wake_up_common+0x4e/0x90
[812709.893133]  [<ffffffff81ff60a0>] ? bit_wait_timeout+0x90/0x90
[812709.893135]  [<ffffffff81ff5832>] ? schedule+0x32/0x80
[812709.893138]  [<ffffffff81ff8d3c>] ? schedule_timeout+0x21c/0x3c0
[812709.893145]  [<ffffffff81cfe860>] ? blk_flush_plug_list+0xa0/0x220
[812709.893148]  [<ffffffff81cfe860>] ? blk_flush_plug_list+0xa0/0x220
[812709.893151]  [<ffffffff81ff60a0>] ? bit_wait_timeout+0x90/0x90
[812709.893153]  [<ffffffff81ff50b4>] ? io_schedule_timeout+0xb4/0x130
[812709.893156]  [<ffffffff81abb5a7>] ? prepare_to_wait_exclusive+0x57/0x80
[812709.893158]  [<ffffffff81ff60b7>] ? bit_wait_io+0x17/0x60
[812709.893161]  [<ffffffff81ff5c5f>] ? __wait_on_bit_lock+0x7f/0xb0
[812709.893163]  [<ffffffff81ff60a0>] ? bit_wait_timeout+0x90/0x90
[812709.893166]  [<ffffffff81ff5e6e>] ? out_of_line_wait_on_bit_lock+0x7e/0xa0
[812709.893169]  [<ffffffff81abb810>] ? autoremove_wake_function+0x40/0x40
[812709.893238]  [<ffffffffc054c6dd>] ? xfs_do_writepage+0x44d/0x700 [xfs]
[812709.893244]  [<ffffffff81bc3cee>] ? page_mkclean+0x6e/0xc0
[812709.893247]  [<ffffffff81bc2070>] ? __page_check_address+0x1b0/0x1b0
[812709.893252]  [<ffffffff81b8c997>] ? write_cache_pages+0x207/0x480
[812709.893299]  [<ffffffffc054c290>] ? xfs_aops_discard_page+0x130/0x130 [xfs]
[812709.893310]  [<ffffffffc0397fa6>] ? dm_make_request+0x76/0xc0 [dm_mod]
[812709.893356]  [<ffffffffc054caba>] ? xfs_vm_writepages+0xba/0xf0 [xfs]
[812709.893360]  [<ffffffff81c3232d>] ? __writeback_single_inode+0x3d/0x330
[812709.893363]  [<ffffffff81c32aed>] ? writeback_sb_inodes+0x23d/0x470
[812709.893367]  [<ffffffff81c32da7>] ? __writeback_inodes_wb+0x87/0xb0
[812709.893371]  [<ffffffff81c33122>] ? wb_writeback+0x282/0x310
[812709.893374]  [<ffffffff81c339f4>] ? wb_workfn+0x214/0x3e0
[812709.893378]  [<ffffffff81a9172b>] ? process_one_work+0x14b/0x410
[812709.893381]  [<ffffffff81a921e5>] ? worker_thread+0x65/0x4a0
[812709.893383]  [<ffffffff81a92180>] ? rescuer_thread+0x340/0x340
[812709.893386]  [<ffffffff81a92180>] ? rescuer_thread+0x340/0x340
[812709.893390]  [<ffffffff81a7c689>] ? do_group_exit+0x39/0xb0
[812709.893393]  [<ffffffff81a974e0>] ? kthread+0xe0/0x100
[812709.893398]  [<ffffffff81a2476b>] ? __switch_to+0x2bb/0x700
[812709.893401]  [<ffffffff81a97400>] ? kthread_park+0x60/0x60
[812709.893405]  [<ffffffff81ffa435>] ? ret_from_fork+0x25/0x30

Server2:
[1004725.175725] INFO: task kworker/u49:2:20785 blocked for more than
120 seconds.
[1004725.175790]       Not tainted 4.9.0-0.bpo.1-amd64 #1
[1004725.175818] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1004725.175859] kworker/u49:2   D    0 20785      2 0x00000000
[1004725.175875] Workqueue: writeback wb_workfn (flush-253:0)
[1004725.175879]  ffff92d41a0efc00 0000000000000000 ffff92cc1bdb3080
ffff92cbcfb5b0c0
[1004725.175882]  ffff92cc1fc187c0 ffffa85ae2aab7a0 ffffffffaa3f536d
00000001c0326d80
[1004725.175885]  0000000000000000 000000001b30ca18 ffffffffa9ebb2ee
ffff92cbcfb5b0c0
[1004725.175888] Call Trace:
[1004725.175898]  [<ffffffffaa3f536d>] ? __schedule+0x23d/0x6d0
[1004725.175904]  [<ffffffffa9ebb2ee>] ? __wake_up_common+0x4e/0x90
[1004725.175906]  [<ffffffffaa3f60a0>] ? bit_wait_timeout+0x90/0x90
[1004725.175908]  [<ffffffffaa3f5832>] ? schedule+0x32/0x80
[1004725.175910]  [<ffffffffaa3f8d3c>] ? schedule_timeout+0x21c/0x3c0
[1004725.175918]  [<ffffffffaa0fe860>] ? blk_flush_plug_list+0xa0/0x220
[1004725.175921]  [<ffffffffaa0fe860>] ? blk_flush_plug_list+0xa0/0x220
[1004725.175923]  [<ffffffffaa3f60a0>] ? bit_wait_timeout+0x90/0x90
[1004725.175925]  [<ffffffffaa3f50b4>] ? io_schedule_timeout+0xb4/0x130
[1004725.175927]  [<ffffffffa9ebb5a7>] ? prepare_to_wait_exclusive+0x57/0x80
[1004725.175928]  [<ffffffffaa3f60b7>] ? bit_wait_io+0x17/0x60
[1004725.175930]  [<ffffffffaa3f5c5f>] ? __wait_on_bit_lock+0x7f/0xb0
[1004725.175932]  [<ffffffffaa3f60a0>] ? bit_wait_timeout+0x90/0x90
[1004725.175934]  [<ffffffffaa3f5e6e>] ? out_of_line_wait_on_bit_lock+0x7e/0xa0
[1004725.175936]  [<ffffffffa9ebb810>] ? autoremove_wake_function+0x40/0x40
[1004725.176006]  [<ffffffffc03386dd>] ? xfs_do_writepage+0x44d/0x700 [xfs]
[1004725.176011]  [<ffffffffa9fc3cee>] ? page_mkclean+0x6e/0xc0
[1004725.176014]  [<ffffffffa9fc2070>] ? __page_check_address+0x1b0/0x1b0
[1004725.176018]  [<ffffffffa9f8c997>] ? write_cache_pages+0x207/0x480
[1004725.176054]  [<ffffffffc0338290>] ? xfs_aops_discard_page+0x130/0x130 [xfs]
[1004725.176063]  [<ffffffffc02c3fa6>] ? dm_make_request+0x76/0xc0 [dm_mod]
[1004725.176097]  [<ffffffffc0338aba>] ? xfs_vm_writepages+0xba/0xf0 [xfs]
[1004725.176101]  [<ffffffffaa03232d>] ? __writeback_single_inode+0x3d/0x330
[1004725.176105]  [<ffffffffa9eee906>] ? ktime_get+0x36/0xa0
[1004725.176108]  [<ffffffffaa032aed>] ? writeback_sb_inodes+0x23d/0x470
[1004725.176111]  [<ffffffffaa032da7>] ? __writeback_inodes_wb+0x87/0xb0
[1004725.176113]  [<ffffffffaa033122>] ? wb_writeback+0x282/0x310
[1004725.176116]  [<ffffffffaa033a98>] ? wb_workfn+0x2b8/0x3e0
[1004725.176121]  [<ffffffffa9e9172b>] ? process_one_work+0x14b/0x410
[1004725.176123]  [<ffffffffa9e921e5>] ? worker_thread+0x65/0x4a0
[1004725.176125]  [<ffffffffa9e92180>] ? rescuer_thread+0x340/0x340
[1004725.176129]  [<ffffffffa9e7c689>] ? do_group_exit+0x39/0xb0
[1004725.176132]  [<ffffffffa9e974e0>] ? kthread+0xe0/0x100
[1004725.176138]  [<ffffffffa9e2476b>] ? __switch_to+0x2bb/0x700
[1004725.176141]  [<ffffffffa9e97400>] ? kthread_park+0x60/0x60
[1004725.176144]  [<ffffffffaa3fa435>] ? ret_from_fork+0x25/0x30

Server3:
[178825.463856] INFO: task kworker/u49:2:18558 blocked for more than
120 seconds.
[178825.463887]       Not tainted 4.9.0-0.bpo.1-amd64 #1
[178825.463905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[178825.463932] kworker/u49:2   D    0 18558      2 0x00000000
[178825.463962] Workqueue: writeback wb_workfn (flush-253:0)
[178825.463984]  ffff9ab95bb47000 0000000000000000 ffff9ab95bdb80c0
ffff9ab8ac259040
[178825.464014]  ffff9ab95fc587c0 ffffb9e2877df7a0 ffffffff93bf536d
00000001c0318d80
[178825.464043]  0000000000000000 000000005bb3a198 ffffffff936bb2ee
ffff9ab8ac259040
[178825.464093] Call Trace:
[178825.464122]  [<ffffffff93bf536d>] ? __schedule+0x23d/0x6d0
[178825.464165]  [<ffffffff936bb2ee>] ? __wake_up_common+0x4e/0x90
[178825.464209]  [<ffffffff93bf60a0>] ? bit_wait_timeout+0x90/0x90
[178825.464232]  [<ffffffff93bf5832>] ? schedule+0x32/0x80
[178825.464265]  [<ffffffff93bf8d3c>] ? schedule_timeout+0x21c/0x3c0
[178825.464289]  [<ffffffff938fe860>] ? blk_flush_plug_list+0xa0/0x220
[178825.464311]  [<ffffffff938fe860>] ? blk_flush_plug_list+0xa0/0x220
[178825.464333]  [<ffffffff93bf60a0>] ? bit_wait_timeout+0x90/0x90
[178825.464354]  [<ffffffff93bf50b4>] ? io_schedule_timeout+0xb4/0x130
[178825.464376]  [<ffffffff936bb5a7>] ? prepare_to_wait_exclusive+0x57/0x80
[178825.464399]  [<ffffffff93bf60b7>] ? bit_wait_io+0x17/0x60
[178825.464418]  [<ffffffff93bf5c5f>] ? __wait_on_bit_lock+0x7f/0xb0
[178825.464439]  [<ffffffff93bf60a0>] ? bit_wait_timeout+0x90/0x90
[178825.464460]  [<ffffffff93bf5e6e>] ? out_of_line_wait_on_bit_lock+0x7e/0xa0
[178825.464484]  [<ffffffff936bb810>] ? autoremove_wake_function+0x40/0x40
[178825.464563]  [<ffffffffc032a6dd>] ? xfs_do_writepage+0x44d/0x700 [xfs]
[178825.464588]  [<ffffffff937c3cee>] ? page_mkclean+0x6e/0xc0
[178825.464608]  [<ffffffff937c2070>] ? __page_check_address+0x1b0/0x1b0
[178825.464645]  [<ffffffff9378c997>] ? write_cache_pages+0x207/0x480
[178825.464705]  [<ffffffffc032a290>] ? xfs_aops_discard_page+0x130/0x130 [xfs]
[178825.464762]  [<ffffffffc01bcfa6>] ? dm_make_request+0x76/0xc0 [dm_mod]
[178825.464851]  [<ffffffffc032aaba>] ? xfs_vm_writepages+0xba/0xf0 [xfs]
[178825.465677]  [<ffffffff9383232d>] ? __writeback_single_inode+0x3d/0x330
[178825.466440]  [<ffffffff9392a737>] ?
fprop_reflect_period_percpu.isra.5+0x77/0xb0
[178825.467228]  [<ffffffff93832aed>] ? writeback_sb_inodes+0x23d/0x470
[178825.467992]  [<ffffffff93832da7>] ? __writeback_inodes_wb+0x87/0xb0
[178825.468857]  [<ffffffff93833122>] ? wb_writeback+0x282/0x310
[178825.469565]  [<ffffffff93833a98>] ? wb_workfn+0x2b8/0x3e0
[178825.470307]  [<ffffffff9369172b>] ? process_one_work+0x14b/0x410
[178825.471075]  [<ffffffff936921e5>] ? worker_thread+0x65/0x4a0
[178825.471765]  [<ffffffff93692180>] ? rescuer_thread+0x340/0x340
[178825.472504]  [<ffffffff9367c689>] ? do_group_exit+0x39/0xb0
[178825.473216]  [<ffffffff936974e0>] ? kthread+0xe0/0x100
[178825.474007]  [<ffffffff9362476b>] ? __switch_to+0x2bb/0x700
[178825.474664]  [<ffffffff93697400>] ? kthread_park+0x60/0x60
[178825.475434]  [<ffffffff93bfa435>] ? ret_from_fork+0x25/0x30


# uname -a
Linux server 4.9.0-0.bpo.1-amd64 #1 SMP Debian 4.9.2-2~bpo8+1
(2017-01-26) x86_64 GNU/Linux

Please let me know if you need any additional data etc.

Thank you,

Erik


Reply to: