[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Question about the expected behaviour of nbd-server for async ops



Wouter,

--On 29 May 2011 08:50:38 +0200 Wouter Verhelst <w@...112...> wrote:

Handles can be reused only once the command in question is completed.

You may process commands out of order, and reply out of order,
save that
a) all write commands *completed* before you process a REQ_FLUSH
   must be written to non-volatile storage prior to completing
   that REQ_FLUSH (though apparently you should, if possible, make
   this true for all write commands *received*, which is a stronger
   condition) [Ignore this if you don't set SEND_REQ_FLUSH]

We already implement that stronger condition, because writes are handled
in the way they are received.

Yes

It shouldn't be too hard to implement when
disordered handling of requests is done, either: stop handling incoming
requests when you receive a flush request; flag all outstanding requests
so you know when the flush can be done (after which you can start
handling incoming requests again); and handle the flush when all flagged
requests have been handled.

Stopping handling requests is more than you need to do (it's not
a barrier) and may be undesirable given how often you get a REQ_FLUSH
with ext4 (for instance). In a threaded model, to satisfy the weaker
condition, a new thread could do the flush and reply. To satisfy
the stronger condition, a new thread could do wait for any already queued
writes to complete (allowing new ones into the queue), then flush and
reply.

Slightly surprisingly, the fsdevel folk's answer to this is that you
can disorder both reads and writes and do what is natural, i.e. do
not maintain ordering. A file system which cares about the result
should not issue reads of blocks for which the writes have not
completed.

Interesting to know.

Though as per my follow-up, I'd avoid this.

>     For a+b how does one report write errors that only appear after
>     the reply? Report them in the next FLUSH request?

You don't. To be safe, I'd error every write (i.e. turn the medium
read only).

I don't think errors that appear after the reply are possible in the
case of b (they are in the case of a, obviously)? Or what am I missing?

I think Goswin is asking "assume no FUA. I get a write request, and
it succeeds (perhaps because I write it to my own internal
cache). I subsequently decide to write that cache to disk, (perhaps
because I REQ_FLUSH, or perhaps decide to flush for my own reasons)
but that errors, perhaps because of an underlying disk error. How
do I report that error?"

> * NBD_CMD_DISC: Wait for all pending requests to finish, close socket

You should reply to all pending requests prior to closing the socket
I believe, mostly as it's polite. I believe the current client doesn't
send a disconnect until all replies are in,

I believe so too, yes.

[...]
and I also think the server may behave a little badly here.

How so?

I /believe/ what happens is that the server just closes the socket and
quits. I think it does not wait until any unsent replies have been sent
(i.e replies that have been generated and are queued in the socket
layer), which may result in them being lost. This is just code reading
rather than testing. It's not a problem with the current client because
it waits for all replies to come in before sending NBD_CMD_DISC.

>   Should this flush data before closing the socket? And if so what if
>   there is an error on flush? I guess clients should send NBD_CMD_FLUSH
>   prior to NBD_CMD_DISC if they care.

No, you should not rely on this happening. Even umount of an ext2 volume
will not send NBD_FLUSH where kernel, client, and server support it.
You don't need to write it then and there (in fact there is no 'then
and there' as an NBD_CMD_DISC has no reply),

It does have one -- the FIN packet. But yeah, it's not an
application-layer reply, that much is true.

Which is mildly irritating, particularly if you want to make sure the
client doesn't do other stuff until you've processed the disconnect.

>   What if there are more requests after this while waiting for pending
>   requests to finish? Should they be ignored or return an error?

I believe it is an, um, undocumented implicit assumption that no
commands are sent after NBD_CMD_DISC is sent. The current server
just closes the socket, which will probably result in an EPIPE
upstream if the FIN packet gets back before these other commands
are written.

The client will flush its outgoing queue before sending a disconnect
request. Indeed, if it didn't do that, badness would ensue.

Oh sure. I think Goswin was asking "can it change its mind and send
other stuff afterwards?" - the answer is no, NBD_CMD_DISC must be
the last command sent.

I might try to improve the documentation. Ideally the protocol should
be written up like an IETF draft (which I might just do).

--
Alex Bligh



Reply to: