[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Nbd] BUG: linux kernel panic after umount nbd dvice



I'm tracking a kernel panic after umounting an nbd device.  Have you see this before?


Description
-----------

Mount an nbd device, start a process reading a file in it, umount, call
qemu-nbd -d, kill the process, kernel panic.

This can also happen after the reading process exits, maybe when a
buffer gets flushed?  We see it when a [jbd2/nbdXXX] process remains
running after a umount and the [nbdXXX] process has quit with no other
processes using the nbd device.


Tested Versions
---------------
3.13, 3.18


Steps to Reproduce
------------------

    qemu-img create f.img 1G
    yes | mkfs.ext4 f.img
    modprobe nbd
    qemu-nbd -c /dev/nbd1 f.img
    sleep 1
    mkdir -p /mnt/1
    mount /dev/nbd1 /mnt/1
    date > /mnt/1/date
    ip netns add ns
    ip netns exec ns tail -f /mnt/1/date >/dev/null 2>&1 &   # NOTE1
    #mount -o remount,ro /mnt/1    # NOTE2
    umount /mnt/1
    qemu-nbd -d /dev/nbd1
    sleep 1
    kill %-

NOTE1: umount will fail with device busy is the proc is not in a netns
NOTE2: remounting readonly avoids the bug


Dump
----

[  441.026251] nbd: registered device at major 43
[  442.168335] EXT4-fs (nbd1): mounted filesystem with ordered data mode. Opts: (null)
[  442.269904] block nbd1: NBD_DISCONNECT
[  442.288192] block nbd1: Receive control failed (result -32)
[  442.288938] block nbd1: queue cleared
[  443.308744] block nbd1: Attempted send on closed socket

(note: I insterted dump_stack() in nbd_handle_req here)

[  443.309307] CPU: 1 PID: 10506 Comm: tail Tainted: G            E  3.18.0+ #3
[  443.309310] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  443.309313]  ffff880212d1ba00 ffff880212cc39b8 ffffffff81764c1c 0000000000000001
[  443.309328]  ffff88021327a0c0 ffff880212cc39e8 ffffffffa04dbb2a ffff88021124cb50
[  443.309332]  0000000000000005 ffff8802038705c0 ffff8800da483000 ffff880212cc3a08
[  443.309336] Call Trace:
[  443.309355]  [<ffffffff81764c1c>] dump_stack+0x46/0x58
[  443.309362]  [<ffffffffa04dbb2a>] do_nbd_request+0x13a/0x185 [nbd]
[  443.309393]  [<ffffffff81351e37>] __blk_run_queue+0x37/0x50
[  443.309409]  [<ffffffff81356a13>] blk_queue_bio+0x323/0x380
[  443.309415]  [<ffffffff81351c80>] generic_make_request+0xc0/0x110
[  443.309421]  [<ffffffff81351d39>] submit_bio+0x69/0x130
[  443.309441]  [<ffffffff8134bbe6>] submit_bio_wait+0x56/0x70
[  443.309446]  [<ffffffff81357fae>] blkdev_issue_flush+0x5e/0x90
[  443.309463]  [<ffffffff81272561>] ext4_sync_fs+0xc1/0x180
[  443.309480]  [<ffffffff8120ce42>] sync_filesystem+0x82/0xb0
[  443.309486]  [<ffffffff811dee64>] generic_shutdown_super+0x34/0x100
[  443.309491]  [<ffffffff811df277>] kill_block_super+0x27/0x70
[  443.309507]  [<ffffffff811df589>] deactivate_locked_super+0x49/0x60
[  443.309512]  [<ffffffff811dfb5e>] deactivate_super+0x4e/0x70
[  443.309532]  [<ffffffff811fc3a3>] cleanup_mnt+0x43/0x90
[  443.309537]  [<ffffffff811fc442>] __cleanup_mnt+0x12/0x20
[  443.309542]  [<ffffffff8108bd34>] task_work_run+0xc4/0xe0
[  443.309558]  [<ffffffff81071729>] do_exit+0x2d9/0xa80
[  443.309593]  [<ffffffff810a49ce>] ? dequeue_task_fair+0x44e/0x660
[  443.309601]  [<ffffffff8107afdf>] ? recalc_sigpending+0x1f/0x60
[  443.309630]  [<ffffffff81071f5f>] do_group_exit+0x3f/0xa0
[  443.309636]  [<ffffffff8107dd63>] get_signal+0x1e3/0x730
[  443.309644]  [<ffffffff81012508>] do_signal+0x28/0xaa0
[  443.309674]  [<ffffffff810ae580>] ? prepare_to_wait_event+0x110/0x110
[  443.309681]  [<ffffffff811dcc9c>] ? vfs_read+0x9c/0x180
[  443.309710]  [<ffffffff81012fe9>] do_notify_resume+0x69/0xb0
[  443.309718]  [<ffffffff8176d30f>] int_signal+0x12/0x17

(this is the kernel panic stack dump immediately after)

[  443.309723] blk_update_request: I/O error, dev nbd1, sector 0
[  443.310773] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[  443.311501] IP: [<ffffffff81367276>] bdevname+0x6/0x30
[  443.312018] PGD 210f26067 PUD 210fa6067 PMD 0 
[  443.312982] Oops: 0000 [#1] SMP 
[  443.313454] Modules linked in: nbd(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_CHECKSUM(E) iptable_mangle(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) ebtable_nat(E) ebtables(E) x_tables(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) openvswitch(E) gre(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) libcrc32c(E) dm_crypt(E) ppdev(E) dm_multipath(E) scsi_dh(E) serio_raw(E) snd_intel8x0(E) snd_ac97_codec(E) ac97_bus(E) joydev(E) snd_pcm(E) snd_timer(E) snd(E) i2c_piix4(E) soundcore(E) parport_pc(E) parport(E) mac_hid(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) btrfs(E) xor(E) raid6_pq(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) ahci(E) libahci(E) e1000(E) video(E)
[  443.314743] CPU: 2 PID: 10506 Comm: tail Tainted: G            E  3.18.0+ #3
[  443.314743] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  443.314743] task: ffff8800d9f10000 ti: ffff880212cc0000 task.ti: ffff880212cc0000
[  443.314743] RIP: 0010:[<ffffffff81367276>]  [<ffffffff81367276>] bdevname+0x6/0x30
[  443.314743] RSP: 0018:ffff880212cc3a48  EFLAGS: 00010202
[  443.314743] RAX: 0000000000000001 RBX: ffff880072605f70 RCX: 0000000100008bf3
[  443.314743] RDX: 0000000000000001 RSI: ffff880212cc3a58 RDI: 0000000000000000
[  443.314743] RBP: ffff880212cc3a98 R08: 0000000000000206 R09: 0000000000000001
[  443.314743] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000020000
[  443.314743] R13: 0000000000001411 R14: 0000000000000000 R15: ffff880213714000
[  443.314743] FS:  00007fe1a45cd740(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
[  443.314743] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  443.314743] CR2: 0000000000000088 CR3: 0000000210f12000 CR4: 00000000000006e0
[  443.314743] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  443.314743] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  443.314743] Stack:
[  443.314743]  ffff880212cc3a98 ffffffff81210e74 ffff880212cc3a88 0000000000000000
[  443.314743]  ffff88021fd13640 0000000000000000 ffff8800da482800 ffff8800da482800
[  443.314743]  ffff880072605f70 0000000000001411 ffff880212cc3aa8 ffffffff81210eb0
[  443.314743] Call Trace:
[  443.314743]  [<ffffffff81210e74>] ? _submit_bh+0x1a4/0x1d0
[  443.314743]  [<ffffffff81210eb0>] submit_bh+0x10/0x20
[  443.314743]  [<ffffffff812add49>] jbd2_write_superblock+0x89/0x190
[  443.314743]  [<ffffffff812adef4>] jbd2_mark_journal_empty+0x64/0xa0
[  443.314743]  [<ffffffff812ae121>] jbd2_journal_destroy+0x1f1/0x220
[  443.314743]  [<ffffffff810ae580>] ? prepare_to_wait_event+0x110/0x110
[  443.314743]  [<ffffffff8127bd04>] ext4_put_super+0x64/0x350
[  443.314743]  [<ffffffff811deea6>] generic_shutdown_super+0x76/0x100
[  443.314743]  [<ffffffff811df277>] kill_block_super+0x27/0x70
[  443.314743]  [<ffffffff811df589>] deactivate_locked_super+0x49/0x60
[  443.314743]  [<ffffffff811dfb5e>] deactivate_super+0x4e/0x70
[  443.314743]  [<ffffffff811fc3a3>] cleanup_mnt+0x43/0x90
[  443.314743]  [<ffffffff811fc442>] __cleanup_mnt+0x12/0x20
[  443.314743]  [<ffffffff8108bd34>] task_work_run+0xc4/0xe0
[  443.314743]  [<ffffffff81071729>] do_exit+0x2d9/0xa80
[  443.314743]  [<ffffffff810a49ce>] ? dequeue_task_fair+0x44e/0x660
[  443.314743]  [<ffffffff8107afdf>] ? recalc_sigpending+0x1f/0x60
[  443.314743]  [<ffffffff81071f5f>] do_group_exit+0x3f/0xa0
[  443.314743]  [<ffffffff8107dd63>] get_signal+0x1e3/0x730
[  443.314743]  [<ffffffff81012508>] do_signal+0x28/0xaa0
[  443.314743]  [<ffffffff810ae580>] ? prepare_to_wait_event+0x110/0x110
[  443.314743]  [<ffffffff811dcc9c>] ? vfs_read+0x9c/0x180
[  443.314743]  [<ffffffff81012fe9>] do_notify_resume+0x69/0xb0
[  443.314743]  [<ffffffff8176d30f>] int_signal+0x12/0x17
[  443.314743] Code: 0c 48 c7 c2 ef 58 ae 81 48 89 df be 20 00 00 00 31 c0 e8 1e 71 02 00 48 89 d8 5b 41 5c 41 5d 41 5e 5d c3 66 90 0f 1f 44 00 00 55 <48> 8b 87 88 00 00 00 48 89 f2 48 8b bf 98 00 00 00 48 89 e5 8b 
[  443.314743] RIP  [<ffffffff81367276>] bdevname+0x6/0x30
[  443.314743]  RSP <ffff880212cc3a48>
[  443.314743] CR2: 0000000000000088
[  443.314743] ---[ end trace d56c02889646ab2e ]---
[  443.314743] Fixing recursive fault but reboot is needed!

See Also
--------

[1] [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown
https://bbs.archlinux.org/viewtopic.php?id=153161

[2] Kernel oops when nbd device is removed before it is unmounted
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/861656


Cheers,
Noel Burton-Krahn
Piston Cloud Computing





Reply to: