[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)



On Thu, 10 Jan 2013 20:29:43 -0800
John Darrah <xyllyx@gmail.com> wrote:

> On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote:
> > On Thu, 3 Jan 2013 21:29:22 -0800
> > John Darrah <xyllyx@gmail.com> wrote:
> > 
> > > On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote:
> > > > On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote:
> > > > > On Sat, 29 Dec 2012 01:24:36 +0100
> > > > > Ben Hutchings <ben@decadent.org.uk> wrote:
> > > > > 
> > > > > > On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote:
> > > > > > > On Sun, 23 Dec 2012 09:10:34 -0500
> > > > > > > Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > [...]
> > > > > > > > I had a look at the code today and suspect that I know what the problem
> > > > > > > > is. When the kernel goes to send a request, it first signs it and then
> > > > > > > > bumps the sequence numbers that it tracks. If the request doesn't
> > > > > > > > actually make it out onto the wire, like when the task catches a
> > > > > > > > signal, those sequence numbers remain high even though the request
> > > > > > > > didn't go out.
> > > > > > > > 
> > > > > > > > Here's an untested patch that might help tell whether this is the
> > > > > > > > case. You may want to try it and see if it does. Note that this fix is
> > > > > > > > a bit of a kludge and is not suitable for merging!
> > > > > > > > 
> > > > > > > > A better fix would involve changing when the sequence number gets
> > > > > > > > bumped in the first place. If this patch seems to help things, then
> > > > > > > > I'll look at coding up that up.
> > > > > > [...]
> > > > > > > I was able to reproduce this, and I don't think the above patch will
> > > > > > > fix it (at least not completely). The problem seems to be that the NT
> > > > > > > cancel command is screwing up the sequence numbers. We'll have to do
> > > > > > > some research to figure out why that's occurring.
> > > > > > 
> > > > > > Jeff, we got a bug report in Debian which seems to be the same problem:
> > > > > > <http://bugs.debian.org/695492>.  Please cc John Darrah and the bug
> > > > > > address as above.
> > > > > > 
> > > > > > Ben.
> > > > > > 
> > > > > 
> > > > > You may want to try this patch. It seems to fix the problem for me, but
> > > > > I think there is probably some more work to do in this area.
> > > > > 
> > > > > http://www.spinics.net/lists/linux-cifs/msg07576.html
> > > > > 
> > > > 
> > > > John, you can test this patch by following instructions at
> > > > <http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>.
> > > > 
> > > > Please reply-to-all to Jeff's message when you have a result.
> > > > 
> > > > Ben.
> > > > 
> > > 
> 
> OK... I built a 3.2.35 kernel with the patch to transport.c 
> and also a 3.7.1 with the patch to smb1ops.c and loaded them 
> into my wheezy VM. I tested both by starting commands to 
> frob the CIFS mounts and then typing a CTRL-C to kill the 
> command, and they were stable (at least 50 attempts using 
> each kernel with the CTRL-C fired at random times into the 
> running command).
> 
> But... now another issue affects both kernels. It seems that 
> after 10 to 15 minutes of non use, the mount hangs and the 
> command accessing the mount can only be killed with a 
> SIGKILL... but only sometimes. Sometimes only a reboot 
> would unwedge things.
> 
> It seems when the mount would hang, I would get the:
>   CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting...
> error except the 3.7 kernel reported 120 seconds instead of 
> the 300 seconds noted above.
> 

Interesting, I haven't noticed that issue, but I'll try to reproduce it
when I get a chance.

> Below is one of the kernel logs after I SIGKILL'd things... 
> it looks like I trigered a fault of some kind. Maybe it has 
> some meaning (this log only happened once).
> 

Hmmm...

Looks like a problem in the virtualbox code. Certainly doesn't appear
to be cifs-related. It seems like we saw something similar when all of
the lockless dcache stuff went upstream, so it may be that the vbox
stuff needs to be forward-ported to handle that correctly.

> -- john
> 
> 
> Jan  7 07:06:34 jax kernel: imklog 5.8.11, log source = /proc/kmsg started.
> Jan  7 07:06:34 jax kernel: [    0.000000] Initializing cgroup subsys cpuset
> Jan  7 07:06:34 jax kernel: [    0.000000] Initializing cgroup subsys cpu
> Jan  7 07:06:34 jax kernel: [    0.000000] Linux version 3.2.0-4-486 (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 Debian 3.2.35-2
> 
> <-a bunch removed->
> 
> Jan  7 08:30:31 jax kernel: [   17.072068] eth0: no IPv6 routers present
> Jan  7 08:31:17 jax kernel: [   63.273900] FS-Cache: Netfs 'cifs' registered for caching
> Jan  7 08:31:17 jax kernel: [   63.304164] CIFS VFS: default security mechanism requested.  The default security mechanism will be upgraded from ntlm to ntlmv2 in kernel release 3.3
> Jan  7 08:51:20 jax kernel: [ 1266.602096] CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting...
> Jan  7 08:51:20 jax kernel: [ 1266.602347] CIFS VFS: Server amifile02 has not responded in 300 seconds. Reconnecting...
> Jan  7 09:06:57 jax kernel: [ 2203.298637] ------------[ cut here ]------------
> Jan  7 09:06:57 jax kernel: [ 2203.298645] WARNING: at /root/linux-3.2.35/fs/dcache.c:1291 d_set_d_op+0x24/0x85()
> Jan  7 09:06:57 jax kernel: [ 2203.298648] Hardware name: VirtualBox
> Jan  7 09:06:57 jax kernel: [ 2203.298651] Modules linked in: des_generic ecb md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix ahci libahci ohci_hcd ehci_hcd libata scsi_mod e1000 usbcore usb_common [last unloaded: scsi_wait_scan]
> Jan  7 09:06:57 jax kernel: [ 2203.298678] Pid: 2657, comm: bash Tainted: G           O 3.2.0-4-486 #1 Debian 3.2.35-2a~test
> Jan  7 09:06:57 jax kernel: [ 2203.298681] Call Trace:
> Jan  7 09:06:57 jax kernel: [ 2203.298686]  [<c1022a98>] ? warn_slowpath_common+0x68/0x79
> Jan  7 09:06:57 jax kernel: [ 2203.298690]  [<c10b09c7>] ? d_set_d_op+0x24/0x85
> Jan  7 09:06:57 jax kernel: [ 2203.298694]  [<c1022ab6>] ? warn_slowpath_null+0xd/0x10
> Jan  7 09:06:57 jax kernel: [ 2203.298697]  [<c10b09c7>] ? d_set_d_op+0x24/0x85
> Jan  7 09:06:57 jax kernel: [ 2203.298702]  [<f99a7f6f>] ? sf_instantiate.isra.0+0x118/0x15c [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298707]  [<f99a80da>] ? sf_create_aux+0x127/0x18b [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298711]  [<f99a7887>] ? sf_lookup+0x80/0x1df [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298715]  [<c10abd24>] ? vfs_create+0x53/0x6a
> Jan  7 09:06:57 jax kernel: [ 2203.298718]  [<c10ac61c>] ? do_last+0x216/0x512
> Jan  7 09:06:57 jax kernel: [ 2203.298722]  [<c10acbc0>] ? path_openat+0xa2/0x298
> Jan  7 09:06:57 jax kernel: [ 2203.298725]  [<c10ace5f>] ? do_filp_open+0x23/0x5c
> Jan  7 09:06:57 jax kernel: [ 2203.298729]  [<c109d2d5>] ? kmem_cache_alloc+0x39/0x89
> Jan  7 09:06:57 jax kernel: [ 2203.298734]  [<c12816ad>] ? _cond_resched+0x5/0x18
> Jan  7 09:06:57 jax kernel: [ 2203.298740]  [<c10a329f>] ? do_sys_open+0x54/0xd2
> Jan  7 09:06:57 jax kernel: [ 2203.298744]  [<c10a333b>] ? sys_open+0x1e/0x23
> Jan  7 09:06:57 jax kernel: [ 2203.298748]  [<c128245c>] ? syscall_call+0x7/0xb
> Jan  7 09:06:57 jax kernel: [ 2203.298751] ---[ end trace 5c4181d85183f288 ]---
> Jan  7 09:06:57 jax kernel: [ 2203.298754] ------------[ cut here ]------------
> Jan  7 09:06:57 jax kernel: [ 2203.298757] WARNING: at /root/linux-3.2.35/fs/dcache.c:1295 d_set_d_op+0x48/0x85()
> Jan  7 09:06:57 jax kernel: [ 2203.298760] Hardware name: VirtualBox
> Jan  7 09:06:57 jax kernel: [ 2203.298763] Modules linked in: des_generic ecb md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix ahci libahci ohci_hcd ehci_hcd libata scsi_mod e1000 usbcore usb_common [last unloaded: scsi_wait_scan]
> Jan  7 09:06:57 jax kernel: [ 2203.298781] Pid: 2657, comm: bash Tainted: G        W  O 3.2.0-4-486 #1 Debian 3.2.35-2a~test
> Jan  7 09:06:57 jax kernel: [ 2203.298784] Call Trace:
> Jan  7 09:06:57 jax kernel: [ 2203.298787]  [<c1022a98>] ? warn_slowpath_common+0x68/0x79
> Jan  7 09:06:57 jax kernel: [ 2203.298791]  [<c10b09eb>] ? d_set_d_op+0x48/0x85
> Jan  7 09:06:57 jax kernel: [ 2203.298794]  [<c1022ab6>] ? warn_slowpath_null+0xd/0x10
> Jan  7 09:06:57 jax kernel: [ 2203.298798]  [<c10b09eb>] ? d_set_d_op+0x48/0x85
> Jan  7 09:06:57 jax kernel: [ 2203.298802]  [<f99a7f6f>] ? sf_instantiate.isra.0+0x118/0x15c [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298806]  [<f99a80da>] ? sf_create_aux+0x127/0x18b [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298810]  [<f99a7887>] ? sf_lookup+0x80/0x1df [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298814]  [<c10abd24>] ? vfs_create+0x53/0x6a
> Jan  7 09:06:57 jax kernel: [ 2203.298818]  [<c10ac61c>] ? do_last+0x216/0x512
> Jan  7 09:06:57 jax kernel: [ 2203.298821]  [<c10acbc0>] ? path_openat+0xa2/0x298
> Jan  7 09:06:57 jax kernel: [ 2203.298824]  [<c10ace5f>] ? do_filp_open+0x23/0x5c
> Jan  7 09:06:57 jax kernel: [ 2203.298828]  [<c109d2d5>] ? kmem_cache_alloc+0x39/0x89
> Jan  7 09:06:57 jax kernel: [ 2203.298832]  [<c12816ad>] ? _cond_resched+0x5/0x18
> Jan  7 09:06:57 jax kernel: [ 2203.298835]  [<c10a329f>] ? do_sys_open+0x54/0xd2
> Jan  7 09:06:57 jax kernel: [ 2203.298839]  [<c10a333b>] ? sys_open+0x1e/0x23
> Jan  7 09:06:57 jax kernel: [ 2203.298843]  [<c128245c>] ? syscall_call+0x7/0xb
> Jan  7 09:06:57 jax kernel: [ 2203.298846] ---[ end trace 5c4181d85183f289 ]---
> 
> <log-end>

-- 
Jeff Layton <jlayton@redhat.com>


Reply to: