[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Restart without killing clients



On Thu, Oct 06, 2016 at 02:45:58PM +0100, Alex Bligh wrote:
> 
> > On 6 Oct 2016, at 13:43, Wouter Verhelst <w@...112...> wrote:
> > 
> > One major limitation which nbd-server currently has is that you can't
> > restart it without killing all existing connections. This isn't ideal.
> > The reason I wrote it that way is that when you shut down the server,
> > it should really stop all processes.
> > 
> > However, recently I've become more and more convinced that in the case
> > of nbd, this doesn't really make much sense. Sure, when a webserver
> > restarts, it doesn't hurt to restart the forked off processes of that
> > webserver, since they're all serving short-lived TCP connections anyway.
> > However, the same is simply not true for nbd; we serve long-lived TCP
> > connections, and killing the child processes of the master nbd-server
> > impacts clients.
> > 
> > I think it's time I thought about a way to restart the server without
> > impacting existing clients. Since nbd uses a fork-per-client method,
> > this shouldn't be too hard (just stop proxying signals from the master
> > process to its child processes, instead just shutdown the master process
> > and be done with it). However, it might be interesting to allow clients
> > to remain running when restarting the server, but not when shutting it
> > down, even if the server had been restarted at some prior point. I guess
> > this should be possible using some IPC method, but I'm not sure whether
> > the extra complexity required for that is worth it.
> 
> It would be useful to understand more what you need to do.

- Restart after configuration updates (yes, this could be done by
  rereading configuration; we already support adding new configuration
  entries, but not yet removing or updating them)
- Restart after package upgrades (can't be done by rereading
  configuration, but it *could* be fine if older connections remain open
  with older server binaries)
- Restart after security updates (which presumably wants to update
  everything, not just the master server)

> gonbdserver can reread its configuration transparently.

That only catches one out of the three scenarios above.

> As I understand it, the kernel and nbd-client are now meant to
> reeastablish the TCP session if it is interrupted. I have no idea how
> well this actually works in practice.

Not sure myself.

> If it does work in practice,
> then one can ALMOST do this simply by terminating the listen(),
> using shutdown(fd, SHUT_RD) on the socket, and processing and replying
> to everything that comes through, then shutdown(fd, SHUT_RDWR) and
> finally closing the socket. I'd be worried about commands stuck in the
> TCP queue client side, but it might be just as easy to fix that there.
> 
> I'm guessing that you're looking for something cleaner involving
> some slightly more complicated signalling?

The signalling I was thinking about was to have a unix domain socket or
SysV IPC or some such wherein the "master" server (the one listen()ing)
could broadcast something to "child" servers (the ones waiting for and
processing requests) and tell them to stop working, or some such.

The more complete scheme that I can think of would work something like
this:

- Master server gets a "please restart" message, and restarts
- New master server starts up, takes control of the unix domain socket,
  and asks all child processes to terminate
- Child processes terminate like so:
  - Stop reading new commands from their socket
  - Process any outstanding commands that they have read from the socket
    but not replied to yet
  - Pass the filedescriptor for the socket to the master server using
    unix domain socket ancillary data, along with any configuration the
    server should know about
  - exit()
- Upon receiving a filedescriptor and configuration data from a client,
  the new master server checks whether the old configuration
  sufficiently matches the new one, and if so, fork()s a new client to
  deal with handling that socket.

That last step is going to be pretty horrible though.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Reply to: