[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Question about the expected behaviour of nbd-server for async ops



Goswin,

--On 29 May 2011 16:03:18 +0200 Goswin von Brederlow <goswin-v-b@...186...> wrote:

I'm not sure what you mean by "synchronously". The current client issues
requests and processes replies asynchronously, i.e. there may be more
than one outstanding.

The server doesn't, currently. So it will have replied to all requests
before it reads the NBD_CMD_DISC.

With the current server. However, despite a reply being queued
in respect of all preceding requests, it may be that not all those
of those packets have reached the wire. They may be in a socket
buffer.

If the server where asynchronous then yes. Now there might be an
in-flight request but it will be completed before the server dies. And I
believe that behaviour should remain.

I think you are misunderstanding the problem. A client sends a normal
command, followed by a disconnect (which is legal). The server then
accepts the normal command, processes it, and sends a reply (by which
I mean "does a write() to the socket". The server then processes the
NBD_CMD_DISC, which does a close() on the socket and exits the
process. My concern is that this does not necessarily mean that the
data enqueued in the socket buffer is in fact sent on the wire.

This does not affect the current client in normal operations because
it does not send the NBD_CMD_DISC until all replies have been received;
however, this is a matter of observation, and not of coding guarantee.

All I am saying is we should take action to ensure we do send any
queued replies (queued here mean "processed and in the socket's
SNDBUF") prior to any action which might cause them to be junked.

If you look at Stephens UNIX Network Programming p202, on SO_LINGER,
there are 3 possible behaviours.

1. l_onoff is 0: l_linger is ignored, close returns immediately
  and the server wil try to deliver the data to the peer.

2. l_onoff is nonzero, and l_linger is zero. TCP aborts the connection
  when it is closed: ***that is TCP discards any data still remaining
  in the socket send buffer and sends an RST to the peer*** (this
  is what we should avoid).

3. l_onoff is non-zero and l_linger is zero: the kernel lingers, i.e.
  close blocks until either the data is sent and acknowledged or the
  linker time expires (which one happens is returned in the error code).
  (this is OK provided we use a sufficient linger time).

That implies that l_onoff=0 l_linger=0 would be fine. However, under
(e.g.) SVR4, close() can lose data. See (e.g.):

http://tinyurl.com/odlj5
Closing a socket: if SO_LINGER has not been called on a socket, then
close() is not supposed to discard data. This is true on SVR4.2 (and,
apparently, on all non-SVR4 systems) but apparently not on SVR4; the use
of either shutdown() or SO_LINGER seems to be required to guarantee
delivery of all data.

In general terms it is not sufficient to rely solely on close() to
send data in portable programs.


Here is what i think should happen:
- on recieving a NBD_CMD_DISC request you shutdown(fd, SHUT_RD)
- process and reply any pending requests
- fsync() /* implicit flush, just to be nice */
- shutdown(fd, SHUT_WR)
- close(fd)
- exit()

That alone doesn't help (I am not sure we do the shutdown but
it might be an improvement).

I missed the shutdown(SHUT_WR) here - that will indeed do what is
required.

What I was saying is the shutdown(SHUT_RD) is unnecessary
and insufficient (see below for why).

I presume you mean fsync() the backing store (as opposed to the
socket, here) - yes, I think that's a good idea. So I would do:
	fsync(backing store)
	shutdown(socket, SHUT_WR)
	close(fd)
	exit()

It (or close) blocks with SO_LINGER set or goes into background
otherwise. Also lingering is allways done on exit.

I can't find a reference to the assertion that there is automatic lingering
on a process exit on all platforms. My understanding is that exiting
a process does an implicit close on fds, and that's all, but I migh
be wrong.

From all google can find me the kernel will still try to send any
remaining data in the outgoing socket buffer till the SO_LINGER timeout
or tcp timeout kills the socket for good. There seem to be no way to
"wait for all data to be send" on a socket prior to closing it.

Sure, but nbd-server is a portable program.

I was thinking of a buggy or malicious client. Say there is a bug in the
linux kernel so it sends the NBD_CMD_DISC followed by a NBD_CMD_READ.
Then we tear down the connection and never reply to the READ. Is that
better than replying with an error to the READ?

We tear down the client, and exit the process. The socket is closed,
so the client will get EPIPE. If the client is buggy or malicious,
that's no better than it deserves!

You forget the network latenz. The client can send additional comands
before the server can tear down the socket. Remember the assumption is a
buggy or malicious client. Clearly a correct client should never ever
send anything after NBD_CMD_DISC. It should probably even shutdown the
writing side of its socket.

I don't think network latency comes into it. The NBD_CMD_DISC is sent by
the client. Unix stream sockets (and TCP) are ordered, therefore anything
the client sends after the _DISC will be received after the _DISC is
received. On receipt of the _DISC, the server closes the socket before
it does another select() or has any chance of reading anything else.
The close() will discard any data in RCVBUF. So I don't think a buggy
client can ever cause any damage by sending data after NBD_CMD_DISC.
Doing a shutdown SHUT_RD doesn't really help because the the buggy client
might have sent more commands after the NBD_CMD_DISC before you get to
the shutdown(); indeed they might be in the same tcp packet.

--
Alex Bligh



Reply to: