[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#735139: nbd recovery after suspend/resume



Ernesto reported that ndb mounts break after suspend/resume when running
Linux 3.2.51:

> [48080.515468] block nbd1: Attempted send on closed socket
> [48080.515473] end_request: I/O error, dev nbd1, sector 91896
> [48080.515718] block nbd1: Attempted send on closed socket
> [48080.515721] end_request: I/O error, dev nbd1, sector 91896
> [48080.515752] ------------[ cut here ]------------
> [48080.515863] kernel BUG at /build/linux-rrsxby/linux-3.2.51/fs/buffer.c:2917!
> [48080.516010] invalid opcode: 0000 [#1] SMP 
> [48080.516176] CPU 0 
> [48080.516188] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_rawmidi nls_utf8 nls_cp437 vfat fat nbd cbc ecb vmnet(O) vsock(O) vmci(O) vmmon(O) parport_pc ppdev lp parport cpufreq_conservative bnep cpufreq_userspace cpufreq_stats cpufreq_powersave rfcomm 8021q garp stp binfmt_misc uinput nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop fuse ecryptfs dm_crypt dm_mod snd_hda_codec_hdmi snd_hda_codec_conexant pl2303 usbserial arc4 iwlwifi joydev btusb mac80211 bluetooth snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i915 drm_kms_helper snd_page_alloc drm iTCO_wdt iTCO_vendor_support snd_seq cfg80211 snd_seq_device snd_timer snd evdev soundcore i2c_i801 dell_laptop i2c_algo_bit i2c_core rfkill coretemp acpi_cpufreq mperf video pcspkr dcdbas psmouse dell_wmi ac serio_raw sparse_keymap processor button battery power_supply wmi ext4 crc16 jbd2 mbcache usbhid hid ums_realtek usb_storage sg sr_mod sd_mod cdrom crc_t10dif xhci_hcd crc32c_intel ghash_clmulni_intel aesni_intel ahci libahci aes_x86_64 thermal thermal_sys libata atl1c scsi_mod ehci_hcd aes_generic cryptd usbcore usb_common [last unloaded: scsi_wait_scan]
> [48080.520191] 
> [48080.520931] Pid: 7672, comm: make Tainted: G           O 3.2.0-4-amd64 #1 Debian 3.2.51-1 Dell Inc.          Dell System Inspiron N411Z/      
> [48080.521803] RIP: 0010:[<ffffffff8111ccc3>]  [<ffffffff8111ccc3>] submit_bh+0x19/0xff
> [48080.522674] RSP: 0018:ffff88017a5e5a68  EFLAGS: 00010246
> [48080.523557] RAX: 0000000000040005 RBX: ffff8800c947af68 RCX: 0000000000000004
> [48080.524480] RDX: 0000000000000000 RSI: ffff8800c947af68 RDI: 0000000000000211
> [48080.525417] RBP: 0000000000000211 R08: 0000000000000200 R09: ffffffff8168f0a0
> [48080.526246] R10: ffff880107a798c0 R11: ffff880107a798c0 R12: ffff8800c919e400
> [48080.527186] R13: 0000000000000001 R14: 000000000001f381 R15: 0000000003c94245
> [48080.528204] FS:  00007fea81a02700(0000) GS:ffff88019fa00000(0000) knlGS:0000000000000000
> [48080.529252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [48080.530326] CR2: 00000000019c8000 CR3: 00000001613a1000 CR4: 00000000000406f0
> [48080.531435] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [48080.532557] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [48080.533691] Process make (pid: 7672, threadinfo ffff88017a5e4000, task ffff8800d066f650)
> [48080.534863] Stack:
> [48080.536040]  ffff8800c947af68 0000000000000211 ffff8800c919e400 ffffffff8111f577
> [48080.537291]  ffff8800d066f650 ffffffff811be4ff ffff8800c947af68 ffff8800c947af68
> [48080.538558]  ffff88015c1cac00 ffffffffa01c4fcd ffffffffa01e5e1d ffff88000dd8e840
> [48080.539848] Call Trace:
> [48080.541140]  [<ffffffff8111f577>] ? __sync_dirty_buffer+0x52/0x87
> [48080.542474]  [<ffffffff811be4ff>] ? __percpu_counter_sum+0x44/0x57
> [48080.543861]  [<ffffffffa01c4fcd>] ? ext4_commit_super+0x191/0x1d3 [ext4]
> [48080.545251]  [<ffffffffa01c636e>] ? ext4_error_inode+0x4c/0xef [ext4]
> [48080.546654]  [<ffffffffa01b4275>] ? ext4_find_entry+0x1eb/0x298 [ext4]
> [48080.548096]  [<ffffffffa01b4350>] ? ext4_lookup+0x2e/0x11c [ext4]
> [48080.549522]  [<ffffffff8110b1d3>] ? __d_alloc+0x12c/0x13c
> [48080.550964]  [<ffffffff81102709>] ? d_alloc_and_lookup+0x3a/0x60
> [48080.552429]  [<ffffffff811031ad>] ? walk_component+0x219/0x406
> [48080.553934]  [<ffffffff810bdce1>] ? add_page_to_lru_list+0x64/0x64
> [48080.555443]  [<ffffffff81104041>] ? path_lookupat+0x7c/0x2bd
> [48080.556949]  [<ffffffff81036628>] ? should_resched+0x5/0x23
> [48080.558485]  [<ffffffff8134deec>] ? _cond_resched+0x7/0x1c
> [48080.560030]  [<ffffffff8110429e>] ? do_path_lookup+0x1c/0x87
> [48080.561541]  [<ffffffff81105d27>] ? user_path_at_empty+0x47/0x7b
> [48080.563129]  [<ffffffff81352198>] ? do_page_fault+0x30a/0x345
> [48080.564737]  [<ffffffff810fdd7a>] ? vfs_fstatat+0x32/0x60
> [48080.566340]  [<ffffffff810fdeb0>] ? sys_newstat+0x12/0x2b
> [48080.567920]  [<ffffffff810fa75e>] ? vfs_write+0xbb/0xe9
> [48080.569477]  [<ffffffff8134f7b5>] ? page_fault+0x25/0x30
> [48080.571036]  [<ffffffff81354212>] ? system_call_fastpath+0x16/0x1b
> [48080.572564] Code: ff b8 01 00 00 00 eb 02 31 c0 5a 5b 5d 41 5c 41 5d c3 41 54 55 89 fd 53 48 8b 06 48 89 f3 a8 04 75 02 0f 0b 48 8b 06 a8 20 75 02 <0f> 0b 48 83 7e 38 00 75 02 0f 0b 48 8b 06 f6 c4 02 74 02 0f 0b 
> [48080.575767] RIP  [<ffffffff8111ccc3>] submit_bh+0x19/0xff
> [48080.577256]  RSP <ffff88017a5e5a68>
> [48080.644282] ---[ end trace c597c77dca040243 ]---

This has apparently been fixed later, as in Linux 3.12.9 they keep
working after resume.

I'm looking to backport the fix, but it's not obvious what that is.
Does anyone know what changes in the nbd kernel driver (or perhaps
elsewhere in the kernel) might have fixed this?

Ben.

-- 
Ben Hutchings
Absolutum obsoletum. (If it works, it's out of date.) - Stafford Beer

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: