Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On Thu, 10 Jan 2013 20:29:43 -0800
John Darrah <xyllyx@gmail.com> wrote:
> On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote:
> > On Thu, 3 Jan 2013 21:29:22 -0800
> > John Darrah <xyllyx@gmail.com> wrote:
> >
> > > On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote:
> > > > On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote:
> > > > > On Sat, 29 Dec 2012 01:24:36 +0100
> > > > > Ben Hutchings <ben@decadent.org.uk> wrote:
> > > > >
> > > > > > On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote:
> > > > > > > On Sun, 23 Dec 2012 09:10:34 -0500
> > > > > > > Jeff Layton <jlayton@redhat.com> wrote:
> > > > > > [...]
> > > > > > > > I had a look at the code today and suspect that I know what the problem
> > > > > > > > is. When the kernel goes to send a request, it first signs it and then
> > > > > > > > bumps the sequence numbers that it tracks. If the request doesn't
> > > > > > > > actually make it out onto the wire, like when the task catches a
> > > > > > > > signal, those sequence numbers remain high even though the request
> > > > > > > > didn't go out.
> > > > > > > >
> > > > > > > > Here's an untested patch that might help tell whether this is the
> > > > > > > > case. You may want to try it and see if it does. Note that this fix is
> > > > > > > > a bit of a kludge and is not suitable for merging!
> > > > > > > >
> > > > > > > > A better fix would involve changing when the sequence number gets
> > > > > > > > bumped in the first place. If this patch seems to help things, then
> > > > > > > > I'll look at coding up that up.
> > > > > > [...]
> > > > > > > I was able to reproduce this, and I don't think the above patch will
> > > > > > > fix it (at least not completely). The problem seems to be that the NT
> > > > > > > cancel command is screwing up the sequence numbers. We'll have to do
> > > > > > > some research to figure out why that's occurring.
> > > > > >
> > > > > > Jeff, we got a bug report in Debian which seems to be the same problem:
> > > > > > <http://bugs.debian.org/695492>. Please cc John Darrah and the bug
> > > > > > address as above.
> > > > > >
> > > > > > Ben.
> > > > > >
> > > > >
> > > > > You may want to try this patch. It seems to fix the problem for me, but
> > > > > I think there is probably some more work to do in this area.
> > > > >
> > > > > http://www.spinics.net/lists/linux-cifs/msg07576.html
> > > > >
> > > >
> > > > John, you can test this patch by following instructions at
> > > > <http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>.
> > > >
> > > > Please reply-to-all to Jeff's message when you have a result.
> > > >
> > > > Ben.
> > > >
> > >
>
> OK... I built a 3.2.35 kernel with the patch to transport.c
> and also a 3.7.1 with the patch to smb1ops.c and loaded them
> into my wheezy VM. I tested both by starting commands to
> frob the CIFS mounts and then typing a CTRL-C to kill the
> command, and they were stable (at least 50 attempts using
> each kernel with the CTRL-C fired at random times into the
> running command).
>
> But... now another issue affects both kernels. It seems that
> after 10 to 15 minutes of non use, the mount hangs and the
> command accessing the mount can only be killed with a
> SIGKILL... but only sometimes. Sometimes only a reboot
> would unwedge things.
>
> It seems when the mount would hang, I would get the:
> CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting...
> error except the 3.7 kernel reported 120 seconds instead of
> the 300 seconds noted above.
>
Interesting, I haven't noticed that issue, but I'll try to reproduce it
when I get a chance.
> Below is one of the kernel logs after I SIGKILL'd things...
> it looks like I trigered a fault of some kind. Maybe it has
> some meaning (this log only happened once).
>
Hmmm...
Looks like a problem in the virtualbox code. Certainly doesn't appear
to be cifs-related. It seems like we saw something similar when all of
the lockless dcache stuff went upstream, so it may be that the vbox
stuff needs to be forward-ported to handle that correctly.
> -- john
>
>
> Jan 7 07:06:34 jax kernel: imklog 5.8.11, log source = /proc/kmsg started.
> Jan 7 07:06:34 jax kernel: [ 0.000000] Initializing cgroup subsys cpuset
> Jan 7 07:06:34 jax kernel: [ 0.000000] Initializing cgroup subsys cpu
> Jan 7 07:06:34 jax kernel: [ 0.000000] Linux version 3.2.0-4-486 (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 Debian 3.2.35-2
>
> <-a bunch removed->
>
> Jan 7 08:30:31 jax kernel: [ 17.072068] eth0: no IPv6 routers present
> Jan 7 08:31:17 jax kernel: [ 63.273900] FS-Cache: Netfs 'cifs' registered for caching
> Jan 7 08:31:17 jax kernel: [ 63.304164] CIFS VFS: default security mechanism requested. The default security mechanism will be upgraded from ntlm to ntlmv2 in kernel release 3.3
> Jan 7 08:51:20 jax kernel: [ 1266.602096] CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting...
> Jan 7 08:51:20 jax kernel: [ 1266.602347] CIFS VFS: Server amifile02 has not responded in 300 seconds. Reconnecting...
> Jan 7 09:06:57 jax kernel: [ 2203.298637] ------------[ cut here ]------------
> Jan 7 09:06:57 jax kernel: [ 2203.298645] WARNING: at /root/linux-3.2.35/fs/dcache.c:1291 d_set_d_op+0x24/0x85()
> Jan 7 09:06:57 jax kernel: [ 2203.298648] Hardware name: VirtualBox
> Jan 7 09:06:57 jax kernel: [ 2203.298651] Modules linked in: des_generic ecb md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix ahci libahci ohci_hcd ehci_hcd libata scsi_mod e1000 usbcore usb_common [last unloaded: scsi_wait_scan]
> Jan 7 09:06:57 jax kernel: [ 2203.298678] Pid: 2657, comm: bash Tainted: G O 3.2.0-4-486 #1 Debian 3.2.35-2a~test
> Jan 7 09:06:57 jax kernel: [ 2203.298681] Call Trace:
> Jan 7 09:06:57 jax kernel: [ 2203.298686] [<c1022a98>] ? warn_slowpath_common+0x68/0x79
> Jan 7 09:06:57 jax kernel: [ 2203.298690] [<c10b09c7>] ? d_set_d_op+0x24/0x85
> Jan 7 09:06:57 jax kernel: [ 2203.298694] [<c1022ab6>] ? warn_slowpath_null+0xd/0x10
> Jan 7 09:06:57 jax kernel: [ 2203.298697] [<c10b09c7>] ? d_set_d_op+0x24/0x85
> Jan 7 09:06:57 jax kernel: [ 2203.298702] [<f99a7f6f>] ? sf_instantiate.isra.0+0x118/0x15c [vboxsf]
> Jan 7 09:06:57 jax kernel: [ 2203.298707] [<f99a80da>] ? sf_create_aux+0x127/0x18b [vboxsf]
> Jan 7 09:06:57 jax kernel: [ 2203.298711] [<f99a7887>] ? sf_lookup+0x80/0x1df [vboxsf]
> Jan 7 09:06:57 jax kernel: [ 2203.298715] [<c10abd24>] ? vfs_create+0x53/0x6a
> Jan 7 09:06:57 jax kernel: [ 2203.298718] [<c10ac61c>] ? do_last+0x216/0x512
> Jan 7 09:06:57 jax kernel: [ 2203.298722] [<c10acbc0>] ? path_openat+0xa2/0x298
> Jan 7 09:06:57 jax kernel: [ 2203.298725] [<c10ace5f>] ? do_filp_open+0x23/0x5c
> Jan 7 09:06:57 jax kernel: [ 2203.298729] [<c109d2d5>] ? kmem_cache_alloc+0x39/0x89
> Jan 7 09:06:57 jax kernel: [ 2203.298734] [<c12816ad>] ? _cond_resched+0x5/0x18
> Jan 7 09:06:57 jax kernel: [ 2203.298740] [<c10a329f>] ? do_sys_open+0x54/0xd2
> Jan 7 09:06:57 jax kernel: [ 2203.298744] [<c10a333b>] ? sys_open+0x1e/0x23
> Jan 7 09:06:57 jax kernel: [ 2203.298748] [<c128245c>] ? syscall_call+0x7/0xb
> Jan 7 09:06:57 jax kernel: [ 2203.298751] ---[ end trace 5c4181d85183f288 ]---
> Jan 7 09:06:57 jax kernel: [ 2203.298754] ------------[ cut here ]------------
> Jan 7 09:06:57 jax kernel: [ 2203.298757] WARNING: at /root/linux-3.2.35/fs/dcache.c:1295 d_set_d_op+0x48/0x85()
> Jan 7 09:06:57 jax kernel: [ 2203.298760] Hardware name: VirtualBox
> Jan 7 09:06:57 jax kernel: [ 2203.298763] Modules linked in: des_generic ecb md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix ahci libahci ohci_hcd ehci_hcd libata scsi_mod e1000 usbcore usb_common [last unloaded: scsi_wait_scan]
> Jan 7 09:06:57 jax kernel: [ 2203.298781] Pid: 2657, comm: bash Tainted: G W O 3.2.0-4-486 #1 Debian 3.2.35-2a~test
> Jan 7 09:06:57 jax kernel: [ 2203.298784] Call Trace:
> Jan 7 09:06:57 jax kernel: [ 2203.298787] [<c1022a98>] ? warn_slowpath_common+0x68/0x79
> Jan 7 09:06:57 jax kernel: [ 2203.298791] [<c10b09eb>] ? d_set_d_op+0x48/0x85
> Jan 7 09:06:57 jax kernel: [ 2203.298794] [<c1022ab6>] ? warn_slowpath_null+0xd/0x10
> Jan 7 09:06:57 jax kernel: [ 2203.298798] [<c10b09eb>] ? d_set_d_op+0x48/0x85
> Jan 7 09:06:57 jax kernel: [ 2203.298802] [<f99a7f6f>] ? sf_instantiate.isra.0+0x118/0x15c [vboxsf]
> Jan 7 09:06:57 jax kernel: [ 2203.298806] [<f99a80da>] ? sf_create_aux+0x127/0x18b [vboxsf]
> Jan 7 09:06:57 jax kernel: [ 2203.298810] [<f99a7887>] ? sf_lookup+0x80/0x1df [vboxsf]
> Jan 7 09:06:57 jax kernel: [ 2203.298814] [<c10abd24>] ? vfs_create+0x53/0x6a
> Jan 7 09:06:57 jax kernel: [ 2203.298818] [<c10ac61c>] ? do_last+0x216/0x512
> Jan 7 09:06:57 jax kernel: [ 2203.298821] [<c10acbc0>] ? path_openat+0xa2/0x298
> Jan 7 09:06:57 jax kernel: [ 2203.298824] [<c10ace5f>] ? do_filp_open+0x23/0x5c
> Jan 7 09:06:57 jax kernel: [ 2203.298828] [<c109d2d5>] ? kmem_cache_alloc+0x39/0x89
> Jan 7 09:06:57 jax kernel: [ 2203.298832] [<c12816ad>] ? _cond_resched+0x5/0x18
> Jan 7 09:06:57 jax kernel: [ 2203.298835] [<c10a329f>] ? do_sys_open+0x54/0xd2
> Jan 7 09:06:57 jax kernel: [ 2203.298839] [<c10a333b>] ? sys_open+0x1e/0x23
> Jan 7 09:06:57 jax kernel: [ 2203.298843] [<c128245c>] ? syscall_call+0x7/0xb
> Jan 7 09:06:57 jax kernel: [ 2203.298846] ---[ end trace 5c4181d85183f289 ]---
>
> <log-end>
--
Jeff Layton <jlayton@redhat.com>
Reply to: