[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#870471: linux-image-3.16.0-4-amd64: random disk lock, jbd2/vda1-8 blocked for more than 120 seconds (kvm/drbd context)



Package: src:linux
Version: 3.16.43-2+deb8u2
Severity: normal

Dear Kernel Maintainers,

I randomly get disk lock on guest VMs on top of a ganeti cluster using
KVM/DRBD. Kernel logs on the guest say:
-----
task jbd2/vda1-8:107 blocked for more than 120 seconds.
[1814040.120033] INFO: task jbd2/vda1-8:107 blocked for more than 120
seconds.
[1814040.120798]       Not tainted 3.16.0-4-amd64 #1
[1814040.121297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814040.122132] jbd2/vda1-8     D ffff880036931948     0   107      2
0x00000000
[1814040.122139]  ffff8800369314f0 0000000000000046 0000000000012f40
ffff880139f8ffd8
[1814040.122144]  0000000000012f40 ffff8800369314f0 ffff880139f8fdc8
ffff880139f8fe60
[1814040.122148]  ffff8800368240b8 ffff8800369314f0 ffff880098d5be00
ffff880139f8fdb0
[1814040.122152] Call Trace:
[1814040.122171]  [<ffffffffa0160fb0>] ?
jbd2_journal_commit_transaction+0x260/0x1a30 [jbd2]
[1814040.122179]  [<ffffffff8105297b>] ? kvm_clock_read+0x1b/0x20
[1814040.122186]  [<ffffffff8101cab5>] ? sched_clock+0x5/0x10
…
-----

There is nothing in the logs (kernel.log, system.log,…) on the hosts.
Hosts and guests both use a stable Jessie kernel.
This report come from the guest having this issue.

The hosts have 256G of RAM and SSDs disks in a hard RAID10 and use
qemu-kvm 2.1+dfsg-12+deb8u6, drbd-utils 8.9.5-1~bpo8+1.

Some people talk about decreasing vm.dirty_background_ratio and
vm.dirty_ratio that could help:
*
https://forum.proxmox.com/threads/vm-blocked-due-to-hung_task_timeout_secs.22488/page-3#post-173136

I was thinking of upgrading the kernel to the jessie-backports version
because some people solve similar issue like this:
* https://phabricator.wikimedia.org/T129891#2853793

Do you think any of the solutions above could be helpful?
Is there an other way I can troubleshoot this issue?

Thanks in advance for your help and your time.

Best regards,
Thomas Pierson


-- Package-specific info:
** Version:
Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc
version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64
root=UUID=e25af31e-8d5f-497e-b8d1-2e99b63bf41e ro text
console=ttyS0,38400n8 quiet

** Not tainted

** Kernel log:
[1814160.132439]  [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0
[1814160.132443]  [<ffffffff811aba72>] ? SyS_write+0x42/0xa0
[1814160.132450]  [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
[1814160.132455] INFO: task master:684 blocked for more than 120 seconds.
[1814160.133139]       Not tainted 3.16.0-4-amd64 #1
[1814160.133636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814160.134468] master          D ffff8800bb16b9c8     0   684      1
0x00000004
[1814160.134472]  ffff8800bb16b570 0000000000000086 0000000000012f40
ffff8800bb243fd8
[1814160.134476]  0000000000012f40 ffff8800bb16b570 ffff880036824000
ffff880036824070
[1814160.134480]  0000000000021dad 0000000000000002 0000000000000000
ffff8801305c7358
[1814160.134484] Call Trace:
[1814160.134491]  [<ffffffffa015e074>] ?
wait_transaction_locked+0x74/0xa0 [jbd2]
[1814160.134495]  [<ffffffff810a9610>] ? prepare_to_wait_event+0xf0/0xf0
[1814160.134501]  [<ffffffffa015e3a1>] ? start_this_handle+0x281/0x5a0
[jbd2]
[1814160.134508]  [<ffffffffa015e8b4>] ? jbd2__journal_start+0xf4/0x200
[jbd2]
[1814160.134518]  [<ffffffffa0182e05>] ? ext4_dirty_inode+0x25/0x60 [ext4]
[1814160.134522]  [<ffffffff811d21e1>] ? __mark_inode_dirty+0x171/0x2b0
[1814160.134526]  [<ffffffff811c47b1>] ? update_time+0x81/0xc0
[1814160.134531]  [<ffffffff8106c5e2>] ? current_fs_time+0x12/0x60
[1814160.134534]  [<ffffffff811c49b0>] ? file_update_time+0x80/0xd0
[1814160.134540]  [<ffffffff811b2dc0>] ? pipe_write+0x3e0/0x450
[1814160.134545]  [<ffffffff811aa7f4>] ? new_sync_write+0x74/0xa0
[1814160.134549]  [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0
[1814160.134552]  [<ffffffff811aba72>] ? SyS_write+0x42/0xa0
[1814160.134557]  [<ffffffff811bf6ed>] ? SyS_poll+0x5d/0xf0
[1814160.134562]  [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
[1814160.134566] INFO: task pickup:25876 blocked for more than 120 seconds.
[1814160.135273]       Not tainted 3.16.0-4-amd64 #1
[1814160.135770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814160.136615] pickup          D ffff880036f25908     0 25876    684
0x00000000
[1814160.136619]  ffff880036f254b0 0000000000000086 0000000000012f40
ffff8800baf2bfd8
[1814160.136623]  0000000000012f40 ffff880036f254b0 ffff880036824000
ffff880036824070
[1814160.136633]  0000000000021dad 0000000000000002 0000000000000000
ffff8801305c7058
[1814160.136663] Call Trace:
[1814160.136676]  [<ffffffffa015e074>] ?
wait_transaction_locked+0x74/0xa0 [jbd2]
[1814160.136681]  [<ffffffff810a9610>] ? prepare_to_wait_event+0xf0/0xf0
[1814160.136687]  [<ffffffffa015e3a1>] ? start_this_handle+0x281/0x5a0
[jbd2]
[1814160.136694]  [<ffffffffa015e8b4>] ? jbd2__journal_start+0xf4/0x200
[jbd2]
[1814160.136703]  [<ffffffffa0182e05>] ? ext4_dirty_inode+0x25/0x60 [ext4]
[1814160.136708]  [<ffffffff811d21e1>] ? __mark_inode_dirty+0x171/0x2b0
[1814160.136712]  [<ffffffff811c47b1>] ? update_time+0x81/0xc0
[1814160.136716]  [<ffffffff811c48e9>] ? touch_atime+0xf9/0x140
[1814160.136732]  [<ffffffff811b30b0>] ? pipe_read+0x280/0x2f0
[1814160.136738]  [<ffffffff811aa751>] ? new_sync_read+0x71/0xa0
[1814160.136742]  [<ffffffff811aada3>] ? vfs_read+0x93/0x170
[1814160.136750]  [<ffffffff811ab9d2>] ? SyS_read+0x42/0xa0
[1814160.136755]  [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
[1814160.136762] INFO: task puppet:28066 blocked for more than 120 seconds.
[1814160.137464]       Not tainted 3.16.0-4-amd64 #1
[1814160.137960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814160.138798] puppet          D ffff8800bb16a728     0 28066    886
0x00000000
[1814160.138802]  ffff8800bb16a2d0 0000000000000082 0000000000012f40
ffff8800baf8ffd8
[1814160.138806]  0000000000012f40 ffff8800bb16a2d0 ffff88013fd137f0
ffff88013ffc3a10
[1814160.138809]  ffff8800baf8fac0 0000000000000002 ffffffff811da390
ffff8800bbbd67e0
[1814160.138813] Call Trace:
[1814160.138818]  [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50
[1814160.138822]  [<ffffffff81516f49>] ? io_schedule+0x99/0x120
[1814160.138826]  [<ffffffff811da39a>] ? sleep_on_buffer+0xa/0x10
[1814160.138831]  [<ffffffff815173d1>] ? __wait_on_bit_lock+0x41/0xa0
[1814160.138834]  [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50
[1814160.138839]  [<ffffffff815174a7>] ?
out_of_line_wait_on_bit_lock+0x77/0x90
[1814160.138843]  [<ffffffff810a9640>] ? autoremove_wake_function+0x30/0x30
[1814160.138851]  [<ffffffffa015fa10>] ? do_get_write_access+0x250/0x4f0
[jbd2]
[1814160.138858]  [<ffffffffa015fcd2>] ?
jbd2_journal_get_write_access+0x22/0x40 [jbd2]
[1814160.138870]  [<ffffffffa01a9fa6>] ?
__ext4_journal_get_write_access+0x36/0x80 [ext4]
[1814160.138879]  [<ffffffffa0178fe6>] ? __ext4_new_inode+0x436/0x1350
[ext4]
[1814160.138891]  [<ffffffffa0189a24>] ? ext4_create+0xc4/0x160 [ext4]
[1814160.138895]  [<ffffffff811b8e54>] ? vfs_create+0xc4/0x130
[1814160.138899]  [<ffffffff811b9f98>] ? do_last+0x10d8/0x1330
[1814160.138902]  [<ffffffff811ba2ab>] ? path_openat+0xbb/0x680
[1814160.138908]  [<ffffffff8116c30d>] ? handle_mm_fault+0x84d/0x1150
[1814160.138912]  [<ffffffff811bb01a>] ? do_filp_open+0x3a/0x90
[1814160.138916]  [<ffffffff811c722c>] ? __alloc_fd+0x7c/0x120
[1814160.138922]  [<ffffffff811aa169>] ? do_sys_open+0x129/0x220
[1814160.138927]  [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15
[1814160.138930] INFO: task cron:28670 blocked for more than 120 seconds.
[1814160.139613]       Not tainted 3.16.0-4-amd64 #1
[1814160.140122] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1814160.140955] cron            D ffff880139d347a8     0 28670    343
0x00000000
[1814160.140959]  ffff880139d34350 0000000000000082 0000000000012f40
ffff88013a20ffd8
[1814160.140962]  0000000000012f40 ffff880139d34350 ffff88013fc137f0
ffff88013ffcf830
[1814160.140966]  0000000000000002 ffffffff811da390 ffff88013a20fba0
0000000000000000
[1814160.140972] Call Trace:
[1814160.140977]  [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50
[1814160.140981]  [<ffffffff81516f49>] ? io_schedule+0x99/0x120
[1814160.140985]  [<ffffffff811da39a>] ? sleep_on_buffer+0xa/0x10
[1814160.140989]  [<ffffffff815172cc>] ? __wait_on_bit+0x5c/0x90
[1814160.140992]  [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50
[1814160.140997]  [<ffffffff81517377>] ? out_of_line_wait_on_bit+0x77/0x90
[1814160.141001]  [<ffffffff810a9640>] ? autoremove_wake_function+0x30/0x30
[1814160.141013]  [<ffffffffa0187ac7>] ? ext4_find_entry+0x117/0x4d0 [ext4]
[1814160.141023]  [<ffffffffa0188371>] ? ext4_lookup+0x41/0x140 [ext4]
[1814160.141029]  [<ffffffff811b3e39>] ? lookup_real+0x19/0x50
[1814160.141032]  [<ffffffff811b9920>] ? do_last+0xa60/0x1330
[1814160.141036]  [<ffffffff811ba2ab>] ? path_openat+0xbb/0x680
[1814160.141042]  [<ffffffff8116c0fc>] ? handle_mm_fault+0x63c/0x1150
[1814160.141046]  [<ffffffff811bb01a>] ? do_filp_open+0x3a/0x90
[1814160.141050]  [<ffffffff811c722c>] ? __alloc_fd+0x7c/0x120
[1814160.141055]  [<ffffffff811aa169>] ? do_sys_open+0x129/0x220
[1814160.141061]  [<ffffffff8151a48d>] ?
system_call_fast_compare_end+0x10/0x15

** Model information
sys_vendor: QEMU
product_name: Standard PC (i440FX + PIIX, 1996)
product_version: pc-i440fx-2.1
chassis_vendor: QEMU
chassis_version: pc-i440fx-2.1
bios_vendor: SeaBIOS
bios_version: 1.7.5-20140531_083030-gandalf

** Loaded modules:
xt_multiport
ip_vs
libcrc32c
crc32c_generic
binfmt_misc
nf_conntrack_ipv6
nf_defrag_ipv6
ip6table_filter
ip6_tables
xt_tcpudp
nf_conntrack_ipv4
nf_defrag_ipv4
xt_conntrack
nf_conntrack
iptable_filter
ip_tables
x_tables
ppdev
evdev
serio_raw
pcspkr
virtio_balloon
pvpanic
parport_pc
parport
ttm
drm_kms_helper
processor
thermal_sys
drm
button
autofs4
ext4
crc16
mbcache
jbd2
sg
sr_mod
cdrom
ata_generic
virtio_blk
virtio_net
floppy
ata_piix
libata
virtio_pci
virtio_ring
scsi_mod
uhci_hcd
ehci_hcd
i2c_piix4
i2c_core
virtio
psmouse
usbcore
usb_common


** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC
[Natoma] [8086:1237] (rev 02)
        Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA
[Natoma/Triton II] [8086:7000]
        Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE
[Natoma/Triton II] [8086:7010] (prog-if 80 [Master])
        Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 0: [virtual] Memory at 000001f0 (32-bit,
non-prefetchable) [size=8]
        Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
        Region 2: [virtual] Memory at 00000170 (32-bit,
non-prefetchable) [size=8]
        Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
        Region 4: I/O ports at c0c0 [size=16]
        Kernel driver in use: ata_piix

00:01.2 USB controller [0c03]: Intel Corporation 82371SB PIIX3 USB
[Natoma/Triton II] [8086:7020] (rev 01) (prog-if 00 [UHCI])
        Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin D routed to IRQ 11
        Region 4: I/O ports at c040 [size=32]
        Kernel driver in use: uhci_hcd

00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI
[8086:7113] (rev 03)
        Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 9
        Kernel driver in use: piix4_smbus

00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446
[1013:00b8] (prog-if 00 [VGA controller])
        Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
        Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at febc0000 [disabled] [size=64K]

00:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
[1af4:1002]
        Subsystem: Red Hat, Inc Device [1af4:0005]
        Physical Slot: 3
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at c060 [size=32]
        Kernel driver in use: virtio-pci

00:04.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device
[1af4:1001]
        Subsystem: Red Hat, Inc Device [1af4:0002]
        Physical Slot: 4
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at c000 [size=64]
        Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: virtio-pci

00:05.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device
[1af4:1000]
        Subsystem: Red Hat, Inc Device [1af4:0001]
        Physical Slot: 5
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at c080 [size=32]
        Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at feb40000 [disabled] [size=256K]
        Capabilities: <access denied>
        Kernel driver in use: virtio-pci

00:06.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device
[1af4:1000]
        Subsystem: Red Hat, Inc Device [1af4:0001]
        Physical Slot: 6
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at c0a0 [size=32]
        Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at feb80000 [disabled] [size=256K]
        Capabilities: <access denied>
        Kernel driver in use: virtio-pci


** USB devices:
not available


-- System Information:
Debian Release: 8.8
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages linux-image-3.16.0-4-amd64 depends on:
ii  debconf [debconf-2.0]                   1.5.56
ii  initramfs-tools [linux-initramfs-tool]  0.120+deb8u3
ii  kmod                                    18-3
ii  linux-base                              3.5

Versions of packages linux-image-3.16.0-4-amd64 recommends:
pn  firmware-linux-free  <none>
pn  irqbalance           <none>

Versions of packages linux-image-3.16.0-4-amd64 suggests:
pn  debian-kernel-handbook  <none>
ii  grub-pc                 2.02~beta2-22+deb8u1
pn  linux-doc-3.16          <none>

Versions of packages linux-image-3.16.0-4-amd64 is related to:
pn  firmware-atheros        <none>
pn  firmware-bnx2           <none>
pn  firmware-bnx2x          <none>
pn  firmware-brcm80211      <none>
pn  firmware-intelwimax     <none>
pn  firmware-ipw2x00        <none>
pn  firmware-ivtv           <none>
pn  firmware-iwlwifi        <none>
pn  firmware-libertas       <none>
pn  firmware-linux          <none>
pn  firmware-linux-nonfree  <none>
pn  firmware-myricom        <none>
pn  firmware-netxen         <none>
pn  firmware-qlogic         <none>
pn  firmware-ralink         <none>
pn  firmware-realtek        <none>
pn  xen-hypervisor          <none>

-- debconf information:

linux-image-3.16.0-4-amd64/postinst/depmod-error-initrd-3.16.0-4-amd64:
false

linux-image-3.16.0-4-amd64/prerm/removing-running-kernel-3.16.0-4-amd64:
true
  linux-image-3.16.0-4-amd64/postinst/mips-initrd-3.16.0-4-amd64:


Reply to: