[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH] Docs: improve description of disconnection methods



On 13 Apr 2016, at 16:39, Eric Blake <eblake@...696...> wrote:

>> 
>> I wouldn't want to loose that. So if the client sends NBD_CMD_DISC
>> without waiting for all his inflight commands to complete, those
>> inflight commands may not be executed at all, because the server
>> is free to process commands in any order. It's going to make
>> server design very awkward if you can only process /some/ commands
>> out of order.
> 
> We already have that constraint - commands with NBD_CMD_FLAG_FUA must be
> processed in a particular order,

No, that is not true. Commands with NBD_CMD_FUA set may be
processed in any order with respect to other commands. They just
MUST NOT reply until the data within them is written to the disk.
This is exactly per the linux block layer.

Here's what the spec says.

 "A server MUST NOT reply to a command that has NBD_CMD_FLAG_FUA set
 in its command flags until the data (if any) written by that command
 is persisted to non-volatile storage."

It is perfectly possible to reorder a FUA write to run before
a another write that is issued first, or behind another write
that is issued after.

FUA is completely independent of ordering.

> and NBD_CMD_FLUSH must be processed in
> a particular order.

No, that's not true either (but you're closer).

Here's what the spec says (again this is the same as the Linux block
layer:

"All write commands (that includes NBD_CMD_WRITE, NBD_WRITE_ZEROES
and NBD_CMD_TRIM) that the server completes (i.e. replies to) prior
to processing to a NBD_CMD_FLUSH MUST be written to non-volatile
storage prior to replying to that NBD_CMD_FLUSH"

That is NOT imposing an ordering constraint on processing commands.
It's saying that when you reply to NBD_CMD_FLUSH all the writes
that have been *replied to* must be completed. So you are at
liberty to process the flush before all your pending writes and
not flush them if you want. You can thus reorder your flushes
as much as you like; if the client doesn't want them reordered
it should not issue the flush until after the writes have
replied.

Strange but true, but this is how the Linux block layer works.

(BTW this was precisely the point I was trying to clarify
earlier on, and I had an additional 'SHOULD' behaviour to
advise against this, and people wanted it out).

Here's what
https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt
has to say:

> Explicit cache flushes
> ----------------------
> 
> The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
> the filesystem and will make sure the volatile cache of the storage device
> has been flushed before the actual I/O operation is started.  This explicitly
> guarantees that previously completed write requests are on non-volatile
> storage before the flagged bio starts. In addition the REQ_FLUSH flag can be
> set on an otherwise empty bio structure, which causes only an explicit cache
> flush without any dependent I/O.  It is recommend to use
> the blkdev_issue_flush() helper for a pure cache flush.

Note that it only talks about COMPLETED writes.

There is no restriction on reordering writes.

> Requiring NBD_CMD_DISC to be processed last is not
> much different than these other two situations (well, different in that
> the other two only have to guarantee that commands _with replies_ have
> hit permanent storage, not ALL commands received).

Seems to me very different as I don't think those ordering constraints
you mention actually exist! Certainly nbd-server.c does not implement
them and neither does gonbdserver.

But:

>> Another alternative would be to make the server
>> wait for all commands to complete before acting on the disconnect
>> (as opposed to or in addition to making the client wait to send
>> it). I'm reasonably relaxed about which one we do, but I think
>> we should do one or the other (or at least say that if the
>> client sends NBD_CMD_DISC without waiting for commands to complete
>> then those commands must not be executed). There are thus
>> various choices for NBD_CMD_DISC.
> 
> I think it is perfectly fair to put the requirement on the server that
> it MUST wait until all inflight commands have been responded to before
> disconnecting;

I suspect we'll end up with that.

> and at the same time that a client SHOULD wait until
> there are no inflight commands before sending NBD_CMD_DISC.

I like that. I think Wouter doesn't.

>> I think the option haggling phase is different (or rather need
>> not be the same). Fundamentally options MUST be processed in
>> the order they are issued,
> 
> No, we already explicitly state that options may be replied to
> out-of-order, and that the burden is on the client to wait for
> particular replies before sending another option of the same type.

Yeah so Wouter pointed out :-(

>> and there is only ever one in
>> flight at a time.
> 
> No, a client can batch send a bunch of options before waiting for any
> replies.

Yeah so Wouter pointed out :-(

>>> 
>>> It might be good to add a "NBD_REP_ERR_NOSERVICE" error, for "server
>>> shutting down"?
>> 
>> I think that's probably a good idea.
> 
> That only works during handshake phase; you may also want to add a
> specific error value for use during transmission phase (maybe
> ECONNRESET? ENOLINK? EPIPE? ESHUTDOWN?)

One of those too, yes.

--
Alex Bligh




Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


Reply to: