Bug#870471: linux-image-3.16.0-4-amd64: random disk lock, jbd2/vda1-8 blocked for more than 120 seconds (kvm/drbd context)
Package: src:linux
Version: 3.16.43-2+deb8u2
Severity: normal
Dear Kernel Maintainers,
I randomly get disk lock on guest VMs on top of a ganeti cluster using
KVM/DRBD. Kernel logs on the guest say:
-----
task jbd2/vda1-8:107 blocked for more than 120 seconds.
[1814040.120033] INFO: task jbd2/vda1-8:107 blocked for more than 120
seconds.
[1814040.120798] Not tainted 3.16.0-4-amd64 #1
[1814040.121297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814040.122132] jbd2/vda1-8 D ffff880036931948 0 107 2
0x00000000
[1814040.122139] ffff8800369314f0 0000000000000046 0000000000012f40
ffff880139f8ffd8
[1814040.122144] 0000000000012f40 ffff8800369314f0 ffff880139f8fdc8
ffff880139f8fe60
[1814040.122148] ffff8800368240b8 ffff8800369314f0 ffff880098d5be00
ffff880139f8fdb0
[1814040.122152] Call Trace:
[1814040.122171] [<ffffffffa0160fb0>] ?
jbd2_journal_commit_transaction+0x260/0x1a30 [jbd2]
[1814040.122179] [<ffffffff8105297b>] ? kvm_clock_read+0x1b/0x20
[1814040.122186] [<ffffffff8101cab5>] ? sched_clock+0x5/0x10
…
-----
There is nothing in the logs (kernel.log, system.log,…) on the hosts.
Hosts and guests both use a stable Jessie kernel.
This report come from the guest having this issue.
The hosts have 256G of RAM and SSDs disks in a hard RAID10 and use
qemu-kvm 2.1+dfsg-12+deb8u6, drbd-utils 8.9.5-1~bpo8+1.
Some people talk about decreasing vm.dirty_background_ratio and
vm.dirty_ratio that could help:
*
https://forum.proxmox.com/threads/vm-blocked-due-to-hung_task_timeout_secs.22488/page-3#post-173136
I was thinking of upgrading the kernel to the jessie-backports version
because some people solve similar issue like this:
* https://phabricator.wikimedia.org/T129891#2853793
Do you think any of the solutions above could be helpful?
Is there an other way I can troubleshoot this issue?
Thanks in advance for your help and your time.
Best regards,
Thomas Pierson
-- Package-specific info:
** Version:
Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc
version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26)
** Command line:
BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64
root=UUID=e25af31e-8d5f-497e-b8d1-2e99b63bf41e ro text
console=ttyS0,38400n8 quiet
** Not tainted
** Kernel log:
[1814160.132439] [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0
[1814160.132443] [<ffffffff811aba72>] ? SyS_write+0x42/0xa0
[1814160.132450] [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
[1814160.132455] INFO: task master:684 blocked for more than 120 seconds.
[1814160.133139] Not tainted 3.16.0-4-amd64 #1
[1814160.133636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814160.134468] master D ffff8800bb16b9c8 0 684 1
0x00000004
[1814160.134472] ffff8800bb16b570 0000000000000086 0000000000012f40
ffff8800bb243fd8
[1814160.134476] 0000000000012f40 ffff8800bb16b570 ffff880036824000
ffff880036824070
[1814160.134480] 0000000000021dad 0000000000000002 0000000000000000
ffff8801305c7358
[1814160.134484] Call Trace:
[1814160.134491] [<ffffffffa015e074>] ?
wait_transaction_locked+0x74/0xa0 [jbd2]
[1814160.134495] [<ffffffff810a9610>] ? prepare_to_wait_event+0xf0/0xf0
[1814160.134501] [<ffffffffa015e3a1>] ? start_this_handle+0x281/0x5a0
[jbd2]
[1814160.134508] [<ffffffffa015e8b4>] ? jbd2__journal_start+0xf4/0x200
[jbd2]
[1814160.134518] [<ffffffffa0182e05>] ? ext4_dirty_inode+0x25/0x60 [ext4]
[1814160.134522] [<ffffffff811d21e1>] ? __mark_inode_dirty+0x171/0x2b0
[1814160.134526] [<ffffffff811c47b1>] ? update_time+0x81/0xc0
[1814160.134531] [<ffffffff8106c5e2>] ? current_fs_time+0x12/0x60
[1814160.134534] [<ffffffff811c49b0>] ? file_update_time+0x80/0xd0
[1814160.134540] [<ffffffff811b2dc0>] ? pipe_write+0x3e0/0x450
[1814160.134545] [<ffffffff811aa7f4>] ? new_sync_write+0x74/0xa0
[1814160.134549] [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0
[1814160.134552] [<ffffffff811aba72>] ? SyS_write+0x42/0xa0
[1814160.134557] [<ffffffff811bf6ed>] ? SyS_poll+0x5d/0xf0
[1814160.134562] [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
[1814160.134566] INFO: task pickup:25876 blocked for more than 120 seconds.
[1814160.135273] Not tainted 3.16.0-4-amd64 #1
[1814160.135770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814160.136615] pickup D ffff880036f25908 0 25876 684
0x00000000
[1814160.136619] ffff880036f254b0 0000000000000086 0000000000012f40
ffff8800baf2bfd8
[1814160.136623] 0000000000012f40 ffff880036f254b0 ffff880036824000
ffff880036824070
[1814160.136633] 0000000000021dad 0000000000000002 0000000000000000
ffff8801305c7058
[1814160.136663] Call Trace:
[1814160.136676] [<ffffffffa015e074>] ?
wait_transaction_locked+0x74/0xa0 [jbd2]
[1814160.136681] [<ffffffff810a9610>] ? prepare_to_wait_event+0xf0/0xf0
[1814160.136687] [<ffffffffa015e3a1>] ? start_this_handle+0x281/0x5a0
[jbd2]
[1814160.136694] [<ffffffffa015e8b4>] ? jbd2__journal_start+0xf4/0x200
[jbd2]
[1814160.136703] [<ffffffffa0182e05>] ? ext4_dirty_inode+0x25/0x60 [ext4]
[1814160.136708] [<ffffffff811d21e1>] ? __mark_inode_dirty+0x171/0x2b0
[1814160.136712] [<ffffffff811c47b1>] ? update_time+0x81/0xc0
[1814160.136716] [<ffffffff811c48e9>] ? touch_atime+0xf9/0x140
[1814160.136732] [<ffffffff811b30b0>] ? pipe_read+0x280/0x2f0
[1814160.136738] [<ffffffff811aa751>] ? new_sync_read+0x71/0xa0
[1814160.136742] [<ffffffff811aada3>] ? vfs_read+0x93/0x170
[1814160.136750] [<ffffffff811ab9d2>] ? SyS_read+0x42/0xa0
[1814160.136755] [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
[1814160.136762] INFO: task puppet:28066 blocked for more than 120 seconds.
[1814160.137464] Not tainted 3.16.0-4-amd64 #1
[1814160.137960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814160.138798] puppet D ffff8800bb16a728 0 28066 886
0x00000000
[1814160.138802] ffff8800bb16a2d0 0000000000000082 0000000000012f40
ffff8800baf8ffd8
[1814160.138806] 0000000000012f40 ffff8800bb16a2d0 ffff88013fd137f0
ffff88013ffc3a10
[1814160.138809] ffff8800baf8fac0 0000000000000002 ffffffff811da390
ffff8800bbbd67e0
[1814160.138813] Call Trace:
[1814160.138818] [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50
[1814160.138822] [<ffffffff81516f49>] ? io_schedule+0x99/0x120
[1814160.138826] [<ffffffff811da39a>] ? sleep_on_buffer+0xa/0x10
[1814160.138831] [<ffffffff815173d1>] ? __wait_on_bit_lock+0x41/0xa0
[1814160.138834] [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50
[1814160.138839] [<ffffffff815174a7>] ?
out_of_line_wait_on_bit_lock+0x77/0x90
[1814160.138843] [<ffffffff810a9640>] ? autoremove_wake_function+0x30/0x30
[1814160.138851] [<ffffffffa015fa10>] ? do_get_write_access+0x250/0x4f0
[jbd2]
[1814160.138858] [<ffffffffa015fcd2>] ?
jbd2_journal_get_write_access+0x22/0x40 [jbd2]
[1814160.138870] [<ffffffffa01a9fa6>] ?
__ext4_journal_get_write_access+0x36/0x80 [ext4]
[1814160.138879] [<ffffffffa0178fe6>] ? __ext4_new_inode+0x436/0x1350
[ext4]
[1814160.138891] [<ffffffffa0189a24>] ? ext4_create+0xc4/0x160 [ext4]
[1814160.138895] [<ffffffff811b8e54>] ? vfs_create+0xc4/0x130
[1814160.138899] [<ffffffff811b9f98>] ? do_last+0x10d8/0x1330
[1814160.138902] [<ffffffff811ba2ab>] ? path_openat+0xbb/0x680
[1814160.138908] [<ffffffff8116c30d>] ? handle_mm_fault+0x84d/0x1150
[1814160.138912] [<ffffffff811bb01a>] ? do_filp_open+0x3a/0x90
[1814160.138916] [<ffffffff811c722c>] ? __alloc_fd+0x7c/0x120
[1814160.138922] [<ffffffff811aa169>] ? do_sys_open+0x129/0x220
[1814160.138927] [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
[1814160.138930] INFO: task cron:28670 blocked for more than 120 seconds.
[1814160.139613] Not tainted 3.16.0-4-amd64 #1
[1814160.140122] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814160.140955] cron D ffff880139d347a8 0 28670 343
0x00000000
[1814160.140959] ffff880139d34350 0000000000000082 0000000000012f40
ffff88013a20ffd8
[1814160.140962] 0000000000012f40 ffff880139d34350 ffff88013fc137f0
ffff88013ffcf830
[1814160.140966] 0000000000000002 ffffffff811da390 ffff88013a20fba0
0000000000000000
[1814160.140972] Call Trace:
[1814160.140977] [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50
[1814160.140981] [<ffffffff81516f49>] ? io_schedule+0x99/0x120
[1814160.140985] [<ffffffff811da39a>] ? sleep_on_buffer+0xa/0x10
[1814160.140989] [<ffffffff815172cc>] ? __wait_on_bit+0x5c/0x90
[1814160.140992] [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50
[1814160.140997] [<ffffffff81517377>] ? out_of_line_wait_on_bit+0x77/0x90
[1814160.141001] [<ffffffff810a9640>] ? autoremove_wake_function+0x30/0x30
[1814160.141013] [<ffffffffa0187ac7>] ? ext4_find_entry+0x117/0x4d0 [ext4]
[1814160.141023] [<ffffffffa0188371>] ? ext4_lookup+0x41/0x140 [ext4]
[1814160.141029] [<ffffffff811b3e39>] ? lookup_real+0x19/0x50
[1814160.141032] [<ffffffff811b9920>] ? do_last+0xa60/0x1330
[1814160.141036] [<ffffffff811ba2ab>] ? path_openat+0xbb/0x680
[1814160.141042] [<ffffffff8116c0fc>] ? handle_mm_fault+0x63c/0x1150
[1814160.141046] [<ffffffff811bb01a>] ? do_filp_open+0x3a/0x90
[1814160.141050] [<ffffffff811c722c>] ? __alloc_fd+0x7c/0x120
[1814160.141055] [<ffffffff811aa169>] ? do_sys_open+0x129/0x220
[1814160.141061] [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
** Model information
sys_vendor: QEMU
product_name: Standard PC (i440FX + PIIX, 1996)
product_version: pc-i440fx-2.1
chassis_vendor: QEMU
chassis_version: pc-i440fx-2.1
bios_vendor: SeaBIOS
bios_version: 1.7.5-20140531_083030-gandalf
** Loaded modules:
xt_multiport
ip_vs
libcrc32c
crc32c_generic
binfmt_misc
nf_conntrack_ipv6
nf_defrag_ipv6
ip6table_filter
ip6_tables
xt_tcpudp
nf_conntrack_ipv4
nf_defrag_ipv4
xt_conntrack
nf_conntrack
iptable_filter
ip_tables
x_tables
ppdev
evdev
serio_raw
pcspkr
virtio_balloon
pvpanic
parport_pc
parport
ttm
drm_kms_helper
processor
thermal_sys
drm
button
autofs4
ext4
crc16
mbcache
jbd2
sg
sr_mod
cdrom
ata_generic
virtio_blk
virtio_net
floppy
ata_piix
libata
virtio_pci
virtio_ring
scsi_mod
uhci_hcd
ehci_hcd
i2c_piix4
i2c_core
virtio
psmouse
usbcore
usb_common
** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC
[Natoma] [8086:1237] (rev 02)
Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA
[Natoma/Triton II] [8086:7000]
Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE
[Natoma/Triton II] [8086:7010] (prog-if 80 [Master])
Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Region 0: [virtual] Memory at 000001f0 (32-bit,
non-prefetchable) [size=8]
Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
Region 2: [virtual] Memory at 00000170 (32-bit,
non-prefetchable) [size=8]
Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
Region 4: I/O ports at c0c0 [size=16]
Kernel driver in use: ata_piix
00:01.2 USB controller [0c03]: Intel Corporation 82371SB PIIX3 USB
[Natoma/Triton II] [8086:7020] (rev 01) (prog-if 00 [UHCI])
Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin D routed to IRQ 11
Region 4: I/O ports at c040 [size=32]
Kernel driver in use: uhci_hcd
00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI
[8086:7113] (rev 03)
Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 9
Kernel driver in use: piix4_smbus
00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446
[1013:00b8] (prog-if 00 [VGA controller])
Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at febc0000 [disabled] [size=64K]
00:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
[1af4:1002]
Subsystem: Red Hat, Inc Device [1af4:0005]
Physical Slot: 3
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at c060 [size=32]
Kernel driver in use: virtio-pci
00:04.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device
[1af4:1001]
Subsystem: Red Hat, Inc Device [1af4:0002]
Physical Slot: 4
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at c000 [size=64]
Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
Kernel driver in use: virtio-pci
00:05.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device
[1af4:1000]
Subsystem: Red Hat, Inc Device [1af4:0001]
Physical Slot: 5
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at c080 [size=32]
Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at feb40000 [disabled] [size=256K]
Capabilities: <access denied>
Kernel driver in use: virtio-pci
00:06.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device
[1af4:1000]
Subsystem: Red Hat, Inc Device [1af4:0001]
Physical Slot: 6
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at c0a0 [size=32]
Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at feb80000 [disabled] [size=256K]
Capabilities: <access denied>
Kernel driver in use: virtio-pci
** USB devices:
not available
-- System Information:
Debian Release: 8.8
APT prefers oldstable-updates
APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.16.0-4-amd64 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages linux-image-3.16.0-4-amd64 depends on:
ii debconf [debconf-2.0] 1.5.56
ii initramfs-tools [linux-initramfs-tool] 0.120+deb8u3
ii kmod 18-3
ii linux-base 3.5
Versions of packages linux-image-3.16.0-4-amd64 recommends:
pn firmware-linux-free <none>
pn irqbalance <none>
Versions of packages linux-image-3.16.0-4-amd64 suggests:
pn debian-kernel-handbook <none>
ii grub-pc 2.02~beta2-22+deb8u1
pn linux-doc-3.16 <none>
Versions of packages linux-image-3.16.0-4-amd64 is related to:
pn firmware-atheros <none>
pn firmware-bnx2 <none>
pn firmware-bnx2x <none>
pn firmware-brcm80211 <none>
pn firmware-intelwimax <none>
pn firmware-ipw2x00 <none>
pn firmware-ivtv <none>
pn firmware-iwlwifi <none>
pn firmware-libertas <none>
pn firmware-linux <none>
pn firmware-linux-nonfree <none>
pn firmware-myricom <none>
pn firmware-netxen <none>
pn firmware-qlogic <none>
pn firmware-ralink <none>
pn firmware-realtek <none>
pn xen-hypervisor <none>
-- debconf information:
linux-image-3.16.0-4-amd64/postinst/depmod-error-initrd-3.16.0-4-amd64:
false
linux-image-3.16.0-4-amd64/prerm/removing-running-kernel-3.16.0-4-amd64:
true
linux-image-3.16.0-4-amd64/postinst/mips-initrd-3.16.0-4-amd64:
Reply to: