Re: [Nbd] nbd-client and SIGSTOP/SIGCONT

To: Ian <coughlan@...866...>
Cc: nbd-general@lists.sourceforge.net
Subject: Re: [Nbd] nbd-client and SIGSTOP/SIGCONT
From: Paul Clements <paul.clements@...124...>
Date: Sat, 16 Apr 2011 14:18:53 -0400
Message-id: <BANLkTikhxVX0QTQaCTrW4P8fL42m9a4s1g@...18...>
Reply-to: paul.clements@...856...
In-reply-to: <18F04FC1323C451FAA5F1A46D91AFEF3@...867...>
References: <18F04FC1323C451FAA5F1A46D91AFEF3@...867...>

On Fri, Apr 15, 2011 at 5:50 PM, Ian <coughlan@...866...> wrote:

> However, when I want to shut it down, things get messy.  The shutdown
> scripts call killall5, and I have populated the appropriate directory with
> the correct "omit" pids, and I see that killall5 doesn't try to kill either
> the server or the clients.  But the nbd-client for the mounted filesystem
> dies anyway, which means the subsequent file IO is broken, and the remaining
> scripts are not executed.  I have added debug to nbd-client, and the
> ioctl(nbd, NBD_DO_IT) call is returning after running killall5.  Then I
> wrote a simple test file that doesn't kill any processes at all, but does
> the signal(-1, SIGSTOP) followed by signal(-1, SIGCONT), just as killall5
> does, and the nbd-client still dies.  Delving deeper into the nbd kernel
> driver, the wait_event_interruptible() call in nbd_find_request() is
> returning -ERESTARTSYS after the  SIGSTOP/SIGCONT sequence has been run.
> This return value is returned from nbd_ioctl() as it should be.  I would
> think that the ioctl(nbd, NBD_DO_IT) would be restarted, but the ioctl()
> still returns to the nbd-client.  And, of course, the socket and nbd_thread
> have been shutdown by the time nbd_ioctl() returns.

Ah, I see...what errno does nbd-client see? Is it ERESTARTSYS or EINTR?

I think we need to quit the ioctl without shutting everything down in
the SIGSTOP case...

$ git diff drivers/block/nbd.c
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index e6fc716..0c426a4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -667,6 +667,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nb
                kthread_stop(thread);

                mutex_lock(&lo->tx_lock);
+               /* if SIGSTOP occurred, exit here and allow nbd-client to retry
+                  the nbd_do_it ioctl */
+               if (lo->harderror == -ERESTARTSYS)
+                       return -ERESTARTSYS;
                if (error)
                        return error;
                sock_shutdown(lo, 0);

I think with this patch and a restart of the ioctl by userland
nbd-client, we could retry and it would work...

Is it possible for you to try this patch? I'm not sure if you also
have to patch nbd-client or if the kernel is retrying the ioctl for
you automatically?

The long-term answer might be to make nbd_do_it its own kernel thread
so this SIGSTOP problem is avoided, but that's a major change to how
nbd works.

--
Paul

Reply to:

Follow-Ups:
- Re: [Nbd] nbd-client and SIGSTOP/SIGCONT
  - From: "Ian" <coughlan@...866...>

References:
- [Nbd] nbd-client and SIGSTOP/SIGCONT
  - From: "Ian" <coughlan@...866...>

Prev by Date: [Nbd] Can't open authorization file (null) (Bad address).
Next by Date: Re: [Nbd] nbd-client and SIGSTOP/SIGCONT
Previous by thread: [Nbd] nbd-client and SIGSTOP/SIGCONT
Next by thread: Re: [Nbd] nbd-client and SIGSTOP/SIGCONT
Index(es):
- Date
- Thread