[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#794680: drbd kernel module incompatible with drbd-utils -> kernel panics



Package: linux
Version: 3.16.7-ckt11-1+deb8u2
Severity: critical

Hi,

TL;DR - please provide a kernel with a newer drbd module (e.g. 8.4.6),
as the current version is incompatible with stable's drbd-utils and
will result in kernel panics under load.

I have the following kernel:

Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u2 (2015-07-17)

This ships with version 8.4.3 of the drbd kernel module (which
advertises '(api:1/proto:86-101)').

Using that version of the module with stable's drbd-utils (8.9.2rc1)
results in kernel panics under heavy I/O load, fairly repeatedly. A
kernel log of a typical crash is attached to this report. I intially
reported this issue to Xen (since it happened in a dom0), and they
referred me to this blog post:

http://blog.chinewalking.com/drbd-kernel-oops-w-trim/

Notably, you will observe that drbd-module >=8.4.4 supports "trim", whereas
8.4.3 does not. Yet the userland tools arrange to use trim anyway:

Aug  4 14:28:24 ophon kernel: [2856757.049680] drbd mws-02474: Agreed to support TRIM on protocol level 

Following that suggestion, I installed the kernel module 8.4.6 from
upstream, and the kernel has stopped panicking.

You might argue that drbd upstream's api/proto discrimination is
inadequate (and perhaps a bug report should go there), but nonetheless
kernel panics are a serious flaw in the kernel (or the offending
module) IMAO.

Regards,

Matthew

-- System Information:
Debian Release: 8.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-backports'), (500, 'stable\
')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Aug  3 16:03:13 opus kernel: [ 1250.026811] drbd mws-priv-1: Starting worker thread (from drbdsetup-84 [12987])
Aug  3 16:03:13 opus kernel: [ 1250.027313] block drbd4: disk( Diskless -> Attaching ) 
Aug  3 16:03:13 opus kernel: [ 1250.027409] drbd mws-priv-1: Method to ensure write ordering: flush
Aug  3 16:03:13 opus kernel: [ 1250.027413] block drbd4: max BIO size = 4096
Aug  3 16:03:13 opus kernel: [ 1250.027418] block drbd4: drbd_bm_resize called with capacity == 41941688
Aug  3 16:03:13 opus kernel: [ 1250.027558] block drbd4: resync bitmap: bits=5242711 words=81918 pages=160
Aug  3 16:03:13 opus kernel: [ 1250.027561] block drbd4: size = 20 GB (20970844 KB)
Aug  3 16:03:13 opus kernel: [ 1250.032268] block drbd4: Writing the whole bitmap, size changed
Aug  3 16:03:13 opus kernel: [ 1250.047827] block drbd4: bitmap WRITE of 160 pages took 4 jiffies
Aug  3 16:03:13 opus kernel: [ 1250.061634] block drbd4: 20 GB (5242711 bits) marked out-of-sync by on disk bit-map.
Aug  3 16:03:13 opus kernel: [ 1250.180186] block drbd4: bitmap READ of 160 pages took 2 jiffies
Aug  3 16:03:13 opus kernel: [ 1250.180291] block drbd4: recounting of set bits took additional 0 jiffies
Aug  3 16:03:13 opus kernel: [ 1250.180293] block drbd4: 20 GB (5242711 bits) marked out-of-sync by on disk bit-map.
Aug  3 16:03:13 opus kernel: [ 1250.180304] block drbd4: Suspended AL updates
Aug  3 16:03:13 opus kernel: [ 1250.180307] block drbd4: disk( Attaching -> Inconsistent ) 
Aug  3 16:03:13 opus kernel: [ 1250.180310] block drbd4: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000
Aug  3 16:03:13 opus kernel: [ 1250.191161] drbd mws-priv-1: conn( StandAlone -> Unconnected ) 
Aug  3 16:03:13 opus kernel: [ 1250.191183] drbd mws-priv-1: Starting receiver thread (from drbd_w_mws-priv [12989])
Aug  3 16:03:13 opus kernel: [ 1250.191345] drbd mws-priv-1: receiver (re)started
Aug  3 16:03:13 opus kernel: [ 1250.191360] drbd mws-priv-1: conn( Unconnected -> WFConnection ) 
Aug  3 16:03:13 opus kernel: [ 1250.689576] drbd mws-priv-1: Handshake successful: Agreed network protocol version 101
Aug  3 16:03:13 opus kernel: [ 1250.689580] drbd mws-priv-1: Agreed to support TRIM on protocol level
Aug  3 16:03:13 opus kernel: [ 1250.689616] drbd mws-priv-1: conn( WFConnection -> WFReportParams ) 
Aug  3 16:03:13 opus kernel: [ 1250.689631] drbd mws-priv-1: Starting asender thread (from drbd_r_mws-priv [12992])
Aug  3 16:03:13 opus kernel: [ 1250.737084] block drbd4: max BIO size = 1048576
Aug  3 16:03:13 opus kernel: [ 1250.737091] block drbd4: drbd_sync_handshake:
Aug  3 16:03:13 opus kernel: [ 1250.737094] block drbd4: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:5242711 flags:0
Aug  3 16:03:13 opus kernel: [ 1250.737096] block drbd4: peer 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:5242711 flags:0
Aug  3 16:03:13 opus kernel: [ 1250.737098] block drbd4: uuid_compare()=0 by rule 10
Aug  3 16:03:13 opus kernel: [ 1250.737100] block drbd4: No resync, but 5242711 bits in bitmap!
Aug  3 16:03:13 opus kernel: [ 1250.737105] block drbd4: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent ) 
Aug  3 16:03:13 opus kernel: [ 1250.737109] block drbd4: Resumed AL updates
Aug  3 16:03:14 opus kernel: [ 1250.773903] block drbd4: Accepted new current UUID, preparing to skip initial sync
Aug  3 16:03:14 opus kernel: [ 1250.777061] block drbd4: bitmap WRITE of 160 pages took 1 jiffies
Aug  3 16:03:14 opus kernel: [ 1250.788564] block drbd4: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Aug  3 16:03:14 opus kernel: [ 1250.788573] block drbd4: disk( Inconsistent -> UpToDate ) pdsk( Inconsistent -> UpToDate ) 
Aug  3 16:03:14 opus kernel: [ 1250.797104] block drbd4: receiver updated UUIDs to 14460554106EF79A:0000000000000000:0000000000000000:0000000000000000
Aug  3 16:03:14 opus kernel: [ 1250.797117] block drbd4: peer( Secondary -> Primary ) 
Aug  3 16:03:15 opus kernel: [ 1251.748952] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Aug  3 16:03:15 opus kernel: [ 1251.748983] BUG: unable to handle kernel paging request at ffff8800022f3d88
Aug  3 16:03:15 opus kernel: [ 1251.749016] IP: [<ffff8800022f3d88>] 0xffff8800022f3d88
Aug  3 16:03:15 opus kernel: [ 1251.749041] PGD 1814067 PUD 1815067 PMD 2f81067 PTE 80100000022f3067
Aug  3 16:03:15 opus kernel: [ 1251.749082] Oops: 0011 [#1] SMP 
Aug  3 16:03:15 opus kernel: [ 1251.749106] Modules linked in: xt_physdev iptable_filter ip_tables x_tables xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc bridge stp llc intel_powerclamp coretemp crc32_pclmul ghash_clmulni_intel joydev hid_generic iTCO_wdt iTCO_vendor_support aesni_intel evdev aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd usbhid ttm hid drm_kms_helper pcspkr drm i2c_i801 lpc_ich mfd_core i7core_edac ioatdma edac_core tpm_tis tpm ipmi_si ipmi_msghandler button shpchp processor thermal_sys drbd lru_cache libcrc32c autofs4 ext4 crc16 mbcache jbd2 dm_mod raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci libata ehci_pci uhci_hcd ehci_hcd scsi_mod usbcore usb_common igb i2c_algo_bit i2c_core dca ptp pps_core
Aug  3 16:03:15 opus kernel: [ 1251.749691] CPU: 0 PID: 12993 Comm: drbd_a_mws-priv Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u2
Aug  3 16:03:15 opus kernel: [ 1251.749726] Hardware name: Intel Corporation S5500WBV/S5500WB, BIOS S5500.86B.01.00.0061.030920121535 03/09/2012
Aug  3 16:03:15 opus kernel: [ 1251.749765] task: ffff8800171742d0 ti: ffff8800022f0000 task.ti: ffff8800022f0000
Aug  3 16:03:15 opus kernel: [ 1251.749847] RIP: e030:[<ffff8800022f3d88>]  [<ffff8800022f3d88>] 0xffff8800022f3d88
Aug  3 16:03:15 opus kernel: [ 1251.749934] RSP: e02b:ffff8800022f3d90  EFLAGS: 00010212
Aug  3 16:03:15 opus kernel: [ 1251.749984] RAX: 00000000fffffffc RBX: ffffffffffffffff RCX: 0000000000000113
Aug  3 16:03:15 opus kernel: [ 1251.750039] RDX: 0000000000000113 RSI: 00000000fffffe01 RDI: ffffffff81463f75
Aug  3 16:03:15 opus kernel: [ 1251.750094] RBP: ffff8800171742d0 R08: ffff8800022f0000 R09: 0000000000000000
Aug  3 16:03:15 opus kernel: [ 1251.750150] R10: ffff88001751b890 R11: 0000000000000000 R12: 0000000000000001
Aug  3 16:03:15 opus kernel: [ 1251.750205] R13: 0000000000000000 R14: 0000000000000010 R15: ffff880016c92000
Aug  3 16:03:15 opus kernel: [ 1251.750263] FS:  00007f90c43c0740(0000) GS:ffff88001fa00000(0000) knlGS:0000000000000000
Aug  3 16:03:15 opus kernel: [ 1251.750348] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug  3 16:03:15 opus kernel: [ 1251.750399] CR2: ffff8800022f3d88 CR3: 000000001073f000 CR4: 0000000000002660
Aug  3 16:03:15 opus kernel: [ 1251.750454] Stack:
Aug  3 16:03:15 opus kernel: [ 1251.750493]  ffff8800022f3d88 0000000000000010 0000000000000000 0000000000000000
Aug  3 16:03:15 opus kernel: [ 1251.750589]  ffff8800022f3d90 0000000000000001 0000000000000000 0000000000000000
Aug  3 16:03:15 opus kernel: [ 1251.750686]  0000000000004100 ffffffffa02577be ffff880016c92080 0000001000000000
Aug  3 16:03:15 opus kernel: [ 1251.750783] Call Trace:
Aug  3 16:03:15 opus kernel: [ 1251.750829]  [<ffffffffa02577be>] ? drbd_asender+0x27e/0x750 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.750887]  [<ffffffffa0260d00>] ? drbd_destroy_connection+0xc0/0xc0 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.750947]  [<ffffffffa0260d46>] ? drbd_thread_setup+0x46/0x130 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.751006]  [<ffffffffa0260d00>] ? drbd_destroy_connection+0xc0/0xc0 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.751065]  [<ffffffff81087fad>] ? kthread+0xbd/0xe0
Aug  3 16:03:15 opus kernel: [ 1251.751114]  [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180
Aug  3 16:03:15 opus kernel: [ 1251.751170]  [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
Aug  3 16:03:15 opus kernel: [ 1251.751221]  [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180
Aug  3 16:03:15 opus kernel: [ 1251.751274] Code: ff ff ff ff ff ff ff ff ff ff ff 88 3d 2f 02 00 88 ff ff 30 e0 00 00 00 00 00 00 12 02 01 00 00 00 00 00 90 3d 2f 02 00 88 ff ff <2b> e0 00 00 00 00 00 00 88 3d 2f 02 00 88 ff ff 10 00 00 00 00 
Aug  3 16:03:15 opus kernel: [ 1251.751694] RIP  [<ffff8800022f3d88>] 0xffff8800022f3d88
Aug  3 16:03:15 opus kernel: [ 1251.751747]  RSP <ffff8800022f3d90>
Aug  3 16:03:15 opus kernel: [ 1251.751790] CR2: ffff8800022f3d88
Aug  3 16:03:15 opus kernel: [ 1251.752128] ---[ end trace 975e04f66c2d9004 ]---
Aug  3 16:03:15 opus kernel: [ 1251.835012] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Aug  3 16:03:15 opus kernel: [ 1251.835235] IP: [<ffffffffa02453bd>] drbd_endio_write_sec_final+0x9d/0x480 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.835408] PGD 0 
Aug  3 16:03:15 opus kernel: [ 1251.835531] Oops: 0002 [#2] SMP 
Aug  3 16:03:15 opus kernel: [ 1251.835704] Modules linked in: xt_physdev iptable_filter ip_tables x_tables xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc bridge stp llc intel_powerclamp coretemp crc32_pclmul ghash_clmulni_intel joydev hid_generic iTCO_wdt iTCO_vendor_support aesni_intel evdev aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd usbhid ttm hid drm_kms_helper pcspkr drm i2c_i801 lpc_ich mfd_core i7core_edac ioatdma edac_core tpm_tis tpm ipmi_si ipmi_msghandler button shpchp processor thermal_sys drbd lru_cache libcrc32c autofs4 ext4 crc16 mbcache jbd2 dm_mod raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci libata ehci_pci uhci_hcd ehci_hcd scsi_mod usbcore usb_common igb i2c_algo_bit i2c_core dca ptp pps_core
Aug  3 16:03:15 opus kernel: [ 1251.840379] CPU: 0 PID: 12992 Comm: drbd_r_mws-priv Tainted: G      D       3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u2
Aug  3 16:03:15 opus kernel: [ 1251.840515] Hardware name: Intel Corporation S5500WBV/S5500WB, BIOS S5500.86B.01.00.0061.030920121535 03/09/2012
Aug  3 16:03:15 opus kernel: [ 1251.840648] task: ffff880016c1a050 ti: ffff8800173e4000 task.ti: ffff8800173e4000
Aug  3 16:03:15 opus kernel: [ 1251.840771] RIP: e030:[<ffffffffa02453bd>]  [<ffffffffa02453bd>] drbd_endio_write_sec_final+0x9d/0x480 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.840951] RSP: e02b:ffff8800173e7ce0  EFLAGS: 00010097
Aug  3 16:03:15 opus kernel: [ 1251.841041] RAX: 0000000000000000 RBX: ffff880017439700 RCX: 000000000000009c
Aug  3 16:03:15 opus kernel: [ 1251.841137] RDX: 0000000000000000 RSI: ffff88000c850200 RDI: ffff88000c85bed0
Aug  3 16:03:15 opus kernel: [ 1251.841233] RBP: ffff88000cb22800 R08: 0000000000000cce R09: ffff88000c850200
Aug  3 16:03:15 opus kernel: [ 1251.841329] R10: 0000000000007ff0 R11: 0000000000000000 R12: ffff880002b676a0
Aug  3 16:03:15 opus kernel: [ 1251.841424] R13: ffff88001f8463b0 R14: ffff88000cb22bb0 R15: ffff88000cb22800
Aug  3 16:03:15 opus kernel: [ 1251.841523] FS:  00007f90c43c0740(0000) GS:ffff88001fa00000(0000) knlGS:0000000000000000
Aug  3 16:03:15 opus kernel: [ 1251.841648] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug  3 16:03:15 opus kernel: [ 1251.841739] CR2: 0000000000000008 CR3: 000000001073f000 CR4: 0000000000002660
Aug  3 16:03:15 opus kernel: [ 1251.841835] Stack:
Aug  3 16:03:15 opus kernel: [ 1251.841913]  0000000000000000 0000000000030003 ffff8800174397b8 0000000000000000
Aug  3 16:03:15 opus kernel: [ 1251.842216]  0000000000000000 0000000000102800 0000000000400000 0000000000104800
Aug  3 16:03:15 opus kernel: [ 1251.842518]  0000000000000000 0000000000102800 0000000000000000 0000000000000000
Aug  3 16:03:15 opus kernel: [ 1251.842820] Call Trace:
Aug  3 16:03:15 opus kernel: [ 1251.842906]  [<ffffffffa0254bb5>] ? drbd_submit_peer_request+0x85/0x330 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.843033]  [<ffffffffa02556ea>] ? receive_Data+0x36a/0xe40 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.843130]  [<ffffffffa0257407>] ? drbd_receiver+0x117/0x250 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.843228]  [<ffffffffa0260d00>] ? drbd_destroy_connection+0xc0/0xc0 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.843328]  [<ffffffffa0260d46>] ? drbd_thread_setup+0x46/0x130 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.843427]  [<ffffffffa0260d00>] ? drbd_destroy_connection+0xc0/0xc0 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.843524]  [<ffffffff81087fad>] ? kthread+0xbd/0xe0
Aug  3 16:03:15 opus kernel: [ 1251.843614]  [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180
Aug  3 16:03:15 opus kernel: [ 1251.843709]  [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
Aug  3 16:03:15 opus kernel: [ 1251.843801]  [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180
Aug  3 16:03:15 opus kernel: [ 1251.843894] Code: 04 48 8b 45 00 48 8d b8 d0 00 00 00 e8 dd bc 2c e1 8b 53 58 49 89 c1 c1 ea 09 01 95 54 02 00 00 49 83 fd ff 48 8b 13 48 8b 43 08 <48> 89 42 08 48 89 10 48 8d 85 c0 03 00 00 48 8b 95 c8 03 00 00 
Aug  3 16:03:15 opus kernel: [ 1251.846996] RIP  [<ffffffffa02453bd>] drbd_endio_write_sec_final+0x9d/0x480 [drbd]
Aug  3 16:03:15 opus kernel: [ 1251.847168]  RSP <ffff8800173e7ce0>
Aug  3 16:03:15 opus kernel: [ 1251.847252] CR2: 0000000000000008
Aug  3 16:03:15 opus kernel: [ 1251.847335] ---[ end trace 975e04f66c2d9005 ]---

Reply to: