[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Question about the expected behaviour of nbd-server for async ops



On Sat, May 28, 2011 at 05:35:22PM +0100, Alex Bligh wrote:
> Goswin,
> 
> --On 28 May 2011 16:37:12 +0200 Goswin von Brederlow <goswin-v-b@...186...> 
> wrote:
> 
> My view is that this is derived from the linux request layer, in
> which case (having asked much the same question on fsdevel
> a couple of days ago) the answers appear to be as follows:
> 
> > 1) Order of replies
> >
> > Currently nbd-server works all requests in order and replies in
> > order. Since every request/reply has a handle to uniquely pair them I
> > assume replying to requests out of order is allowed and will (most
> > likely) be handled correctly by existing clients.
> 
> Handles can be reused only once the command in question is completed.
> 
> You may process commands out of order, and reply out of order,
> save that
> a) all write commands *completed* before you process a REQ_FLUSH
>    must be written to non-volatile storage prior to completing
>    that REQ_FLUSH (though apparently you should, if possible, make
>    this true for all write commands *received*, which is a stronger
>    condition) [Ignore this if you don't set SEND_REQ_FLUSH]

We already implement that stronger condition, because writes are handled
in the way they are received. It shouldn't be too hard to implement when
disordered handling of requests is done, either: stop handling incoming
requests when you receive a flush request; flag all outstanding requests
so you know when the flush can be done (after which you can start
handling incoming requests again); and handle the flush when all flagged
requests have been handled.

[...]
> > 2) Overlapping requests
> >
> > I assume that requests may overlap. For example a client may write a
> > block of data and read it again before the write was ACKed. This would
> > be unexpected behaviour from a proper client but not forbidden.
> 
> Correct
> 
> > As such
> > the server has to internally ensure the proper order of overlapping
> > requests.
> 
> Slightly surprisingly, the fsdevel folk's answer to this is that you
> can disorder both reads and writes and do what is natural, i.e. do
> not maintain ordering. A file system which cares about the result
> should not issue reads of blocks for which the writes have not
> completed.

Interesting to know.

[...]
> >   + not NBD_CMD_FLAG_FUA:
> >     a) reply when the data has been recieved
> >     b) reply when the data has been commited to cache (write() returned)
> >     c) reply when the data has been commited to physical medium
> 
> You may do any of those. Provided you will write the data "eventually"
> (i.e. when you receive a REQ_FLUSH or a disconnect).
> 
> >     For a+b how does one report write errors that only appear after
> >     the reply? Report them in the next FLUSH request?
> 
> You don't. To be safe, I'd error every write (i.e. turn the medium
> read only).

I don't think errors that appear after the reply are possible in the
case of b (they are in the case of a, obviously)? Or what am I missing?

[...]
> > * NBD_CMD_DISC: Wait for all pending requests to finish, close socket
> 
> You should reply to all pending requests prior to closing the socket
> I believe, mostly as it's polite. I believe the current client doesn't
> send a disconnect until all replies are in,

I believe so too, yes.

[...]
> and I also think the server may behave a little badly here.

How so?

> >   Should this flush data before closing the socket? And if so what if
> >   there is an error on flush? I guess clients should send NBD_CMD_FLUSH
> >   prior to NBD_CMD_DISC if they care.
> 
> No, you should not rely on this happening. Even umount of an ext2 volume
> will not send NBD_FLUSH where kernel, client, and server support it.
> You don't need to write it then and there (in fact there is no 'then
> and there' as an NBD_CMD_DISC has no reply),

It does have one -- the FIN packet. But yeah, it's not an
application-layer reply, that much is true.

> but you cannot guarantee *at all* that you will have received any sort
> of flush under any circumstances.

Correct. All you know is that the server will close its file handles on
disconnect.

> >   What if there are more requests after this while waiting for pending
> >   requests to finish? Should they be ignored or return an error?
> 
> I believe it is an, um, undocumented implicit assumption that no
> commands are sent after NBD_CMD_DISC is sent. The current server
> just closes the socket, which will probably result in an EPIPE
> upstream if the FIN packet gets back before these other commands
> are written.

The client will flush its outgoing queue before sending a disconnect
request. Indeed, if it didn't do that, badness would ensue.

[...]
-- 
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a



Reply to: