I'm tracking a kernel panic after umounting an nbd device. Have you see this before?
Description
-----------
Mount an nbd device, start a process reading a file in it, umount, call
qemu-nbd -d, kill the process, kernel panic.
This can also happen after the reading process exits, maybe when a
buffer gets flushed? We see it when a [jbd2/nbdXXX] process remains
running after a umount and the [nbdXXX] process has quit with no other
processes using the nbd device.
Tested Versions
---------------
3.13, 3.18
Steps to Reproduce
------------------
qemu-img create f.img 1G
yes | mkfs.ext4 f.img
modprobe nbd
qemu-nbd -c /dev/nbd1 f.img
sleep 1
mkdir -p /mnt/1
mount /dev/nbd1 /mnt/1
date > /mnt/1/date
ip netns add ns
ip netns exec ns tail -f /mnt/1/date >/dev/null 2>&1 & # NOTE1
#mount -o remount,ro /mnt/1 # NOTE2
umount /mnt/1
qemu-nbd -d /dev/nbd1
sleep 1
kill %-
NOTE1: umount will fail with device busy is the proc is not in a netns
NOTE2: remounting readonly avoids the bug
Dump
----
[ 441.026251] nbd: registered device at major 43
[ 442.168335] EXT4-fs (nbd1): mounted filesystem with ordered data mode. Opts: (null)
[ 442.269904] block nbd1: NBD_DISCONNECT
[ 442.288192] block nbd1: Receive control failed (result -32)
[ 442.288938] block nbd1: queue cleared
[ 443.308744] block nbd1: Attempted send on closed socket
(note: I insterted dump_stack() in nbd_handle_req here)
[ 443.309307] CPU: 1 PID: 10506 Comm: tail Tainted: G E 3.18.0+ #3
[ 443.309310] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 443.309313] ffff880212d1ba00 ffff880212cc39b8 ffffffff81764c1c 0000000000000001
[ 443.309328] ffff88021327a0c0 ffff880212cc39e8 ffffffffa04dbb2a ffff88021124cb50
[ 443.309332] 0000000000000005 ffff8802038705c0 ffff8800da483000 ffff880212cc3a08
[ 443.309336] Call Trace:
[ 443.309355] [<ffffffff81764c1c>] dump_stack+0x46/0x58
[ 443.309362] [<ffffffffa04dbb2a>] do_nbd_request+0x13a/0x185 [nbd]
[ 443.309393] [<ffffffff81351e37>] __blk_run_queue+0x37/0x50
[ 443.309409] [<ffffffff81356a13>] blk_queue_bio+0x323/0x380
[ 443.309415] [<ffffffff81351c80>] generic_make_request+0xc0/0x110
[ 443.309421] [<ffffffff81351d39>] submit_bio+0x69/0x130
[ 443.309441] [<ffffffff8134bbe6>] submit_bio_wait+0x56/0x70
[ 443.309446] [<ffffffff81357fae>] blkdev_issue_flush+0x5e/0x90
[ 443.309463] [<ffffffff81272561>] ext4_sync_fs+0xc1/0x180
[ 443.309480] [<ffffffff8120ce42>] sync_filesystem+0x82/0xb0
[ 443.309486] [<ffffffff811dee64>] generic_shutdown_super+0x34/0x100
[ 443.309491] [<ffffffff811df277>] kill_block_super+0x27/0x70
[ 443.309507] [<ffffffff811df589>] deactivate_locked_super+0x49/0x60
[ 443.309512] [<ffffffff811dfb5e>] deactivate_super+0x4e/0x70
[ 443.309532] [<ffffffff811fc3a3>] cleanup_mnt+0x43/0x90
[ 443.309537] [<ffffffff811fc442>] __cleanup_mnt+0x12/0x20
[ 443.309542] [<ffffffff8108bd34>] task_work_run+0xc4/0xe0
[ 443.309558] [<ffffffff81071729>] do_exit+0x2d9/0xa80
[ 443.309593] [<ffffffff810a49ce>] ? dequeue_task_fair+0x44e/0x660
[ 443.309601] [<ffffffff8107afdf>] ? recalc_sigpending+0x1f/0x60
[ 443.309630] [<ffffffff81071f5f>] do_group_exit+0x3f/0xa0
[ 443.309636] [<ffffffff8107dd63>] get_signal+0x1e3/0x730
[ 443.309644] [<ffffffff81012508>] do_signal+0x28/0xaa0
[ 443.309674] [<ffffffff810ae580>] ? prepare_to_wait_event+0x110/0x110
[ 443.309681] [<ffffffff811dcc9c>] ? vfs_read+0x9c/0x180
[ 443.309710] [<ffffffff81012fe9>] do_notify_resume+0x69/0xb0
[ 443.309718] [<ffffffff8176d30f>] int_signal+0x12/0x17
(this is the kernel panic stack dump immediately after)
[ 443.309723] blk_update_request: I/O error, dev nbd1, sector 0
[ 443.310773] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[ 443.311501] IP: [<ffffffff81367276>] bdevname+0x6/0x30
[ 443.312018] PGD 210f26067 PUD 210fa6067 PMD 0
[ 443.312982] Oops: 0000 [#1] SMP
[ 443.313454] Modules linked in: nbd(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_CHECKSUM(E) iptable_mangle(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) ebtable_nat(E) ebtables(E) x_tables(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) openvswitch(E) gre(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) libcrc32c(E) dm_crypt(E) ppdev(E) dm_multipath(E) scsi_dh(E) serio_raw(E) snd_intel8x0(E) snd_ac97_codec(E) ac97_bus(E) joydev(E) snd_pcm(E) snd_timer(E) snd(E) i2c_piix4(E) soundcore(E) parport_pc(E) parport(E) mac_hid(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) btrfs(E) xor(E) raid6_pq(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) ahci(E) libahci(E) e1000(E) video(E)
[ 443.314743] CPU: 2 PID: 10506 Comm: tail Tainted: G E 3.18.0+ #3
[ 443.314743] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 443.314743] task: ffff8800d9f10000 ti: ffff880212cc0000 task.ti: ffff880212cc0000
[ 443.314743] RIP: 0010:[<ffffffff81367276>] [<ffffffff81367276>] bdevname+0x6/0x30
[ 443.314743] RSP: 0018:ffff880212cc3a48 EFLAGS: 00010202
[ 443.314743] RAX: 0000000000000001 RBX: ffff880072605f70 RCX: 0000000100008bf3
[ 443.314743] RDX: 0000000000000001 RSI: ffff880212cc3a58 RDI: 0000000000000000
[ 443.314743] RBP: ffff880212cc3a98 R08: 0000000000000206 R09: 0000000000000001
[ 443.314743] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000020000
[ 443.314743] R13: 0000000000001411 R14: 0000000000000000 R15: ffff880213714000
[ 443.314743] FS: 00007fe1a45cd740(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
[ 443.314743] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 443.314743] CR2: 0000000000000088 CR3: 0000000210f12000 CR4: 00000000000006e0
[ 443.314743] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 443.314743] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 443.314743] Stack:
[ 443.314743] ffff880212cc3a98 ffffffff81210e74 ffff880212cc3a88 0000000000000000
[ 443.314743] ffff88021fd13640 0000000000000000 ffff8800da482800 ffff8800da482800
[ 443.314743] ffff880072605f70 0000000000001411 ffff880212cc3aa8 ffffffff81210eb0
[ 443.314743] Call Trace:
[ 443.314743] [<ffffffff81210e74>] ? _submit_bh+0x1a4/0x1d0
[ 443.314743] [<ffffffff81210eb0>] submit_bh+0x10/0x20
[ 443.314743] [<ffffffff812add49>] jbd2_write_superblock+0x89/0x190
[ 443.314743] [<ffffffff812adef4>] jbd2_mark_journal_empty+0x64/0xa0
[ 443.314743] [<ffffffff812ae121>] jbd2_journal_destroy+0x1f1/0x220
[ 443.314743] [<ffffffff810ae580>] ? prepare_to_wait_event+0x110/0x110
[ 443.314743] [<ffffffff8127bd04>] ext4_put_super+0x64/0x350
[ 443.314743] [<ffffffff811deea6>] generic_shutdown_super+0x76/0x100
[ 443.314743] [<ffffffff811df277>] kill_block_super+0x27/0x70
[ 443.314743] [<ffffffff811df589>] deactivate_locked_super+0x49/0x60
[ 443.314743] [<ffffffff811dfb5e>] deactivate_super+0x4e/0x70
[ 443.314743] [<ffffffff811fc3a3>] cleanup_mnt+0x43/0x90
[ 443.314743] [<ffffffff811fc442>] __cleanup_mnt+0x12/0x20
[ 443.314743] [<ffffffff8108bd34>] task_work_run+0xc4/0xe0
[ 443.314743] [<ffffffff81071729>] do_exit+0x2d9/0xa80
[ 443.314743] [<ffffffff810a49ce>] ? dequeue_task_fair+0x44e/0x660
[ 443.314743] [<ffffffff8107afdf>] ? recalc_sigpending+0x1f/0x60
[ 443.314743] [<ffffffff81071f5f>] do_group_exit+0x3f/0xa0
[ 443.314743] [<ffffffff8107dd63>] get_signal+0x1e3/0x730
[ 443.314743] [<ffffffff81012508>] do_signal+0x28/0xaa0
[ 443.314743] [<ffffffff810ae580>] ? prepare_to_wait_event+0x110/0x110
[ 443.314743] [<ffffffff811dcc9c>] ? vfs_read+0x9c/0x180
[ 443.314743] [<ffffffff81012fe9>] do_notify_resume+0x69/0xb0
[ 443.314743] [<ffffffff8176d30f>] int_signal+0x12/0x17
[ 443.314743] Code: 0c 48 c7 c2 ef 58 ae 81 48 89 df be 20 00 00 00 31 c0 e8 1e 71 02 00 48 89 d8 5b 41 5c 41 5d 41 5e 5d c3 66 90 0f 1f 44 00 00 55 <48> 8b 87 88 00 00 00 48 89 f2 48 8b bf 98 00 00 00 48 89 e5 8b
[ 443.314743] RIP [<ffffffff81367276>] bdevname+0x6/0x30
[ 443.314743] RSP <ffff880212cc3a48>
[ 443.314743] CR2: 0000000000000088
[ 443.314743] ---[ end trace d56c02889646ab2e ]---
[ 443.314743] Fixing recursive fault but reboot is needed!
See Also
--------
[1] [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown
[2] Kernel oops when nbd device is removed before it is unmounted
Cheers,
Noel Burton-Krahn
Piston Cloud Computing