[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] nbd recovery after suspend/resume



Hi Paul,

Op vrijdag 21 februari 2014 22:54:43 schreef Paul Clements:
> Is it a kernel related thing, or could it be the fix to nbd-client signal
> handling as detailed in this thread?

I doubt that it is userspace-related. When reading the bug log, no mention is 
made of changes in userspace, only of using a more recent kernel.

However, that doesn't mean there are no such changes. Ernesto, did you change 
anything in the way you ran the NBD device?

If not, can you clarify how exactly you're establishing the client side of the 
NBD connection?

Thanks,

> https://www.mail-archive.com/nbd-general@lists.sourceforge.net/msg01568.html
> On Wed, Feb 19, 2014 at 6:10 PM, Ben Hutchings <ben@...1505...> wrote:
> > Ernesto reported that ndb mounts break after suspend/resume when running
> > 
> > Linux 3.2.51:
> > > [48080.515468] block nbd1: Attempted send on closed socket
> > > [48080.515473] end_request: I/O error, dev nbd1, sector 91896
> > > [48080.515718] block nbd1: Attempted send on closed socket
> > > [48080.515721] end_request: I/O error, dev nbd1, sector 91896
> > > [48080.515752] ------------[ cut here ]------------
> > > [48080.515863] kernel BUG at
> > 
> > /build/linux-rrsxby/linux-3.2.51/fs/buffer.c:2917!
> > 
> > > [48080.516010] invalid opcode: 0000 [#1] SMP
> > > [48080.516176] CPU 0
> > > [48080.516188] Modules linked in: snd_usb_audio snd_usbmidi_lib
> > 
> > snd_seq_midi snd_seq_midi_event snd_rawmidi nls_utf8 nls_cp437 vfat fat
> > nbd
> > cbc ecb vmnet(O) vsock(O) vmci(O) vmmon(O) parport_pc ppdev lp parport
> > cpufreq_conservative bnep cpufreq_userspace cpufreq_stats
> > cpufreq_powersave
> > rfcomm 8021q garp stp binfmt_misc uinput nfsd nfs nfs_acl auth_rpcgss
> > fscache lockd sunrpc loop fuse ecryptfs dm_crypt dm_mod snd_hda_codec_hdmi
> > snd_hda_codec_conexant pl2303 usbserial arc4 iwlwifi joydev btusb mac80211
> > bluetooth snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i915
> > drm_kms_helper
> > snd_page_alloc drm iTCO_wdt iTCO_vendor_support snd_seq cfg80211
> > snd_seq_device snd_timer snd evdev soundcore i2c_i801 dell_laptop
> > i2c_algo_bit i2c_core rfkill coretemp acpi_cpufreq mperf video pcspkr
> > dcdbas psmouse dell_wmi ac serio_raw sparse_keymap processor button
> > battery
> > power_supply wmi ext4 crc16 jbd2 mbcache usbhid hid ums_realtek
> > usb_storage
> > sg sr_mod sd_mod cdrom crc_t10dif xhci_hcd crc32c_intel
> > ghash_clmulni_intel
> > aesni_intel ahci libahci aes_x86_64 thermal thermal_sys libata atl1c
> > scsi_mod ehci_hcd aes_generic cryptd usbcore usb_common [last unloaded:
> > scsi_wait_scan]
> > 
> > > [48080.520191]
> > > [48080.520931] Pid: 7672, comm: make Tainted: G           O
> > 
> > 3.2.0-4-amd64 #1 Debian 3.2.51-1 Dell Inc.          Dell System Inspiron
> > N411Z/
> > 
> > > [48080.521803] RIP: 0010:[<ffffffff8111ccc3>]  [<ffffffff8111ccc3>]
> > 
> > submit_bh+0x19/0xff
> > 
> > > [48080.522674] RSP: 0018:ffff88017a5e5a68  EFLAGS: 00010246
> > 
> > > [48080.523557] RAX: 0000000000040005 RBX: ffff8800c947af68 RCX:
> > 0000000000000004
> > 
> > > [48080.524480] RDX: 0000000000000000 RSI: ffff8800c947af68 RDI:
> > 0000000000000211
> > 
> > > [48080.525417] RBP: 0000000000000211 R08: 0000000000000200 R09:
> > ffffffff8168f0a0
> > 
> > > [48080.526246] R10: ffff880107a798c0 R11: ffff880107a798c0 R12:
> > ffff8800c919e400
> > 
> > > [48080.527186] R13: 0000000000000001 R14: 000000000001f381 R15:
> > 0000000003c94245
> > 
> > > [48080.528204] FS:  00007fea81a02700(0000) GS:ffff88019fa00000(0000)
> > 
> > knlGS:0000000000000000
> > 
> > > [48080.529252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > 
> > > [48080.530326] CR2: 00000000019c8000 CR3: 00000001613a1000 CR4:
> > 00000000000406f0
> > 
> > > [48080.531435] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > 
> > > [48080.532557] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > 
> > > [48080.533691] Process make (pid: 7672, threadinfo ffff88017a5e4000,
> > 
> > task ffff8800d066f650)
> > 
> > > [48080.534863] Stack:
> > > [48080.536040]  ffff8800c947af68 0000000000000211 ffff8800c919e400
> > 
> > ffffffff8111f577
> > 
> > > [48080.537291]  ffff8800d066f650 ffffffff811be4ff ffff8800c947af68
> > 
> > ffff8800c947af68
> > 
> > > [48080.538558]  ffff88015c1cac00 ffffffffa01c4fcd ffffffffa01e5e1d
> > 
> > ffff88000dd8e840
> > 
> > > [48080.539848] Call Trace:
> > > [48080.541140]  [<ffffffff8111f577>] ? __sync_dirty_buffer+0x52/0x87
> > > [48080.542474]  [<ffffffff811be4ff>] ? __percpu_counter_sum+0x44/0x57
> > > [48080.543861]  [<ffffffffa01c4fcd>] ? ext4_commit_super+0x191/0x1d3
> > 
> > [ext4]
> > 
> > > [48080.545251]  [<ffffffffa01c636e>] ? ext4_error_inode+0x4c/0xef [ext4]
> > > [48080.546654]  [<ffffffffa01b4275>] ? ext4_find_entry+0x1eb/0x298
> > > [ext4]
> > > [48080.548096]  [<ffffffffa01b4350>] ? ext4_lookup+0x2e/0x11c [ext4]
> > > [48080.549522]  [<ffffffff8110b1d3>] ? __d_alloc+0x12c/0x13c
> > > [48080.550964]  [<ffffffff81102709>] ? d_alloc_and_lookup+0x3a/0x60
> > > [48080.552429]  [<ffffffff811031ad>] ? walk_component+0x219/0x406
> > > [48080.553934]  [<ffffffff810bdce1>] ? add_page_to_lru_list+0x64/0x64
> > > [48080.555443]  [<ffffffff81104041>] ? path_lookupat+0x7c/0x2bd
> > > [48080.556949]  [<ffffffff81036628>] ? should_resched+0x5/0x23
> > > [48080.558485]  [<ffffffff8134deec>] ? _cond_resched+0x7/0x1c
> > > [48080.560030]  [<ffffffff8110429e>] ? do_path_lookup+0x1c/0x87
> > > [48080.561541]  [<ffffffff81105d27>] ? user_path_at_empty+0x47/0x7b
> > > [48080.563129]  [<ffffffff81352198>] ? do_page_fault+0x30a/0x345
> > > [48080.564737]  [<ffffffff810fdd7a>] ? vfs_fstatat+0x32/0x60
> > > [48080.566340]  [<ffffffff810fdeb0>] ? sys_newstat+0x12/0x2b
> > > [48080.567920]  [<ffffffff810fa75e>] ? vfs_write+0xbb/0xe9
> > > [48080.569477]  [<ffffffff8134f7b5>] ? page_fault+0x25/0x30
> > > [48080.571036]  [<ffffffff81354212>] ? system_call_fastpath+0x16/0x1b
> > > [48080.572564] Code: ff b8 01 00 00 00 eb 02 31 c0 5a 5b 5d 41 5c 41 5d
> > 
> > c3 41 54 55 89 fd 53 48 8b 06 48 89 f3 a8 04 75 02 0f 0b 48 8b 06 a8 20 75
> > 02 <0f> 0b 48 83 7e 38 00 75 02 0f 0b 48 8b 06 f6 c4 02 74 02 0f 0b
> > 
> > > [48080.575767] RIP  [<ffffffff8111ccc3>] submit_bh+0x19/0xff
> > > [48080.577256]  RSP <ffff88017a5e5a68>
> > > [48080.644282] ---[ end trace c597c77dca040243 ]---
> > 
> > This has apparently been fixed later, as in Linux 3.12.9 they keep
> > working after resume.
> > 
> > I'm looking to backport the fix, but it's not obvious what that is.
> > Does anyone know what changes in the nbd kernel driver (or perhaps
> > elsewhere in the kernel) might have fixed this?
> > 
> > Ben.
> > 
> > --
> > Ben Hutchings
> > Absolutum obsoletum. (If it works, it's out of date.) - Stafford Beer
> > 
> > 
> > --------------------------------------------------------------------------
> > ---- Managing the Performance of Cloud-Based Applications
> > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> > Read the Whitepaper.
> > 
> > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clkt
> > rk _______________________________________________
> > Nbd-general mailing list
> > Nbd-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nbd-general

-- 
This end should point toward the ground if you want to go to space.

If it starts pointing toward space you are having a bad problem and you
will not go to space today.

  -- http://xkcd.com/1133/




Reply to: