[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension



Hi all,

(side note: this seems to be mostly an NBD discussion at this point.
Does qemu-devel need to be in the loop before we've finished that? I
don't care either way, but then I don't want to bore or annoy people...)

On Wed, Mar 30, 2016 at 11:45:04AM -0600, Eric Blake wrote:
> On 03/30/2016 12:50 AM, Alex Bligh wrote:
[...]
> So taking your sentence at face value, yes there is another solution
> possible: require that any NBD_OPT_* haggling used to negotiate the use
> of any other command with a structured reply MUST fail if the client has
> not first negotiated NBD_OPT_STRUCTURED_READ, but leaves the connection
> open so the client can continue option haggling.  That way, the only way
> to end option haggling with the new command enabled is to also enable
> structured reads.  The burden is then on the client to haggle in the
> correct order, and on the server to track haggling state when deciding
> how to answer the option requests for the new commands.
> 
> I'm a bit reluctant to put ordering requirements on option haggling
> (that option A must be turned on before option B),

Yes, me too.

> but then again, the
> SELECT extension requires NBD_OPT_SELECT to be haggled prior to
> NBD_OPT_GO, so there's precedent.

Yeah, but then, that's only because GO is meant to transition from
negotiation to transmission, so it needs to be after *any* other
negotiation; anything else would defeat its purpose.

Requiring structured read after negotiating other structured commands
could easily be done by saying that it's an error to leave the
negotiation phase with "some other" structured command negotiated, but
not structured read.

On the other hand, I feel compelled to point out that we only find
ourselves in this hole because we've coupled the structured reply with
the read command. That may have looked like a good idea at the time, but
obviously it isn't. If we just have it be "negotiate the structured
reply format" rather than "the structured read reply", and state that
once negotiated, the structured reply format is required for any command
with a payload, we're done.

Since only the read reply has a payload at this point in time, you don't
really need to special-case it, anyway.

> I also am thinking of proposing an extension for option haggling to
> communicate minimum/preferred alignments and maximum don't-fragment
> sizing, and that option would have to be enabled before
> OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it would change the
> NBD_REP_SERVER layout in response to those option requests), which
> would be another case where option A affects the behavior of option B.

I reused the NBD_REP_SERVER command in reply to the NBD_OPT_SELECT
command since its purpose seems fairly similar ("send metadata about an
export to the client"), but that may have been a mistake. It certainly
wasn't meant that if you say NBD_OPT_SELECT first and then go
NBD_OPT_LIST, that the NBD_REP_SERVER reply to NBD_OPT_LIST should be
the one as specified in the SELECT extension.

[...]
> > These multiple error chunks are neat. However, I suspect lazy implementors
> > may just send an error without an offset.
> 
> Any ideas are appreciated on how to word it to suggest that servers
> SHOULD try to give full details; but I think we want the fallback to the
> no-offset case because some situations will not have an offset.

I'm wary of making the spec too complicated. Adding such language might
make it unclear. Since as you say it can't be a hard-and-fast rule, I'd
prefer that we just trust implementor's judgement on this.

Not sending an offset (even if it would be possible) can certainly be
the better choice in some cases -- a protocol description can never know
all the trade-offs and choices a particular implementor may want or need
to make.

> >> +    The server MAY send additional chunks or offset error replies, if
> >> +    `NBD_REPLY_FLAG_DONE` was not set, but MUST ensure the final reply
> >> +    also reports an error (that is, the final reply MUST NOT use
> >> +    `NBD_REPLY_TYPE_NONE`), and MAY reuse an offset reported earlier
> >> +    in constructing the final reply.
> > 
> > I'm not sure I get that bit. Why don't you make an errorred reply simply
> > one that contains one or more error chunks. An errorred reply need not contain
> > all the data requested (though each chunk must be complete). A reply that
> > isn't errorred needs not contain all the data requested. Why do you
> > need anything stronger than that? So if you have a parallelised server which
> > is simply sending several reads in parallel (think Ceph) it sends the
> > result from each thread, possibly followed by an error packet, and some
> > other thread notices when all of these have completed and sends a
> > NBD_REPLY_TYPE_NONE packet (always, error or not) to close of use of the
> > handle. This seems perfectly natural and no harder for the client to deal
> > with, but you are prohibiting it.
> 
> I was thinking that it's easier on the client if the final chunk always
> serves as a reliable indicator of success (OFFSET_DATA, OFFSET_HOLE,
> NONE) or error (ERROR, ERROR_OFFSET).

The client will already need to keep state to reassemble a fragmented
and out-of-order read reply, anyway. If that's already the case, adding
in the requirement to *also* keep track of error state (which could in
the simplest case be a single bit for a client which doesn't care about
offsets or error count) isn't that much more of an issue.

I'm with Alex on this one.

[...]
> >> +    the client's length request is larger than 65,536 bytes (or if a
> >> +    later extension adds a way to negotiate a larger maximum fragment
> >> +    size), the server MAY reject the command with `EOVERFLOW`.  The
> >> +    `EOVERFLOW` error MUST NOT be used if the `NBD_CMD_FLAG_DF` flag
> >> +    was not set, or if the requested length is no larger than 65,536.
> 
> I'm also wondering if the wording should be switched along the lines of:
> 
> If the flag is set and the server deems the request to be too large, the
> server MAY reject the command with `EOVERFLOW`; however, the server MUST
> NOT reject a request that is no larger than 65,536 bytes in this manner.

Yes, that seems better.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Reply to: