Re: [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension
On 29 Mar 2016, at 04:56, Eric Blake <eblake@...696...> wrote:
> The existing transmission phase protocol is difficult to sniff,
> because correct interpretation of the server stream requires
> context from the client stream (or risks false positives if
> data payloads happen to contain the protocol magic numbers). It
> also prohibits the ability to do short reads, or to return a
> read error without also sending length bytes of data.
>
> Remedy this by adding a new global/client flag negotiation,
> which affects the response of the NBD_CMD_READ command, and sets
> forth rules for how future command responses must behave when
> they carry a data payload.
>
> Signed-off-by: Eric Blake <eblake@...696...>
> ---
> doc/proto.md | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 123 insertions(+)
>
> diff --git a/doc/proto.md b/doc/proto.md
> index 44579fc..f687e3e 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -209,6 +209,10 @@ same value for handle as was sent by the client for each request that
> the server is replying to, so that the client may correlate which
> request is receiving a response.
>
> +Note that it is impossible to tell by reading just the server traffic
> +whether a data field will be present. To remedy this, the experimental
> +`Structured Reply` extension has been introduced; see below.
> +
> ## Values
>
> This section describes the value and meaning of constants (other than
> @@ -231,6 +235,8 @@ the first magic number.
> - bit 1, `NBD_FLAG_NO_ZEROES`; if set, and if the client replies with
> `NBD_FLAG_C_NO_ZEROES` in the client flags field, the server MUST NOT
> send the 124 bytes of zero at the end of the negotiation.
> +- bit 2, `NBD_FLAG_STRUCTURED_REPLY`; defined by the experimental
> + `Structured Reply` extension; see below.
>
> The server MUST NOT set any other flags, and SHOULD NOT change behaviour
> unless the client responds with a corresponding flag. The server MUST
> @@ -265,6 +271,8 @@ receiving the global flags from the server.
> - bit 1, `NBD_FLAG_C_NO_ZEROES`; MUST NOT be set if the server did not
> set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 124
> bytes of zeroes at the end of the negotiation.
> +- bit 2, `NBD_FLAG_C_STRUCTURED_REPLY`; defined by the experimental
> + `Structured Reply` extension; see below.
>
> Clients SHOULD NOT set any other flags; the server MUST drop the
> connection if the client sets an unknown flag, or a flag that does
> @@ -435,6 +443,10 @@ The following request types exist:
> signalling no error), the server MUST immediately close the
> connection; it MUST NOT send any further data to the client.
>
> + The experimental `Structured Reply` extension changes the set of
> + valid replies, in part to allow recovery after a partial read; see
> + below.
> +
> * `NBD_CMD_WRITE` (1)
>
> A write request. Length and offset define the location and amount of
> @@ -609,6 +621,117 @@ option reply type.
> message if they do not also send it as a reply to the
> `NBD_OPT_SELECT` message.
>
> +### `Structured Reply` extension
> +
> +Some major downsides of the default reply to `NBD_CMD_READ` is that it
> +is not possible to support partial reads (the command must succeed or
> +fail as a whole, and len bytes of data must be sent even on an error,
> +unless the connection is closed); nor is it possible to decode the
> +server traffic without also knowing what pending read requests were
> +sent by the client.
"Some major downsides is" does not agree grammatically, and this
sentence is pretty convoluted.
How about:
Two of the major downsides of the default reply to `NBD_CMD_READ` (without
structured replies) are as follows. Firstly, it is not possible to support
partial reads (the command must succeed or fail as a whole, and len bytes
of data must be sent even on an error unless the connection is closed).
Secondly, it is not possible to decode the server traffic without also
knowing what pending read requests were sent by the client.
> +To remedy this, a `Structured Reply` extension is envisioned. This
if it's actually part of the standard (i.e. if this diff is accepted)
then its more than 'envisioned', it's 'specified', or 'available'
or similar.
> +extension adds a global flag, a client flag, a new reply type during
> +the transmission phase, and alters the set of valid replies to an
> +existing command.
> +
> +* `NBD_FLAG_STRUCTURED_REPLY` (bit 2)
> +
> + This is a global flag; if set, and if the client replies with
> + `NBD_FLAG_C_STRUCTURED_REPLY` in the client flags field, the server
> + MUST use structured replies to the `NBD_CMD_READ` command.
> +
> +* `NBD_FLAG_C_STRUCTURED_REPLY` (bit 2)
> +
> + This is a client flag; MUST NOT be set if the server did not set
*it* MUST not be set ...
> + `NBD_FLAG_STRUCTURED_REPLY`. If set, the server must use structured
> + replies to the `NBD_CMD_READ` command.
> +
> +* Transmission phase
> +
> + The transmission phase includes a third message type: the structured
> + reply. If `NBD_FLAG_C_STRUCTURED_REPLY` was negotiated, then the
> + normal server reply will never contain a data payload, and all
> + server replies that send data will be in the following form:
> +
> + S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
> + S: 32 bits, error
> + S: 64 bits, handle
> + S: 32 bits, length of payload (unsigned)
> + S: *length* bytes of payload data
Unless I'm missing something this doesn't entirely solve the problem.
Imagine you are implementing NBD_CMD_READ (with structured reply)
and are asked to read 4G of data. 1G in you find an error. You can't
set the error ahead of time as you don't know (yet) there is an
error. By the time you discover, you've already streamed 1G of data.
What do you do now?
And this section seems at odds with the section below (starting
"Conversely" which describes an offset/length scheme not detailed
above.
I think you are saying that there can be one or more of these
structured replies in reference to any request, in which
case it should be made clearer, and not wait until NBD_CMD_READ.
> + If `NBD_FLAG_C_STRUCTURED_REPLY` was not negotiated, then the normal
> + server reply with a data payload will be used for `NBD_CMD_READ`;
> + but any other replies with a data payload will still use a
> + structured reply (that is, only `NBD_CMD_READ` is allowed to send
> + data in the non-structured form, and negotiating
> + `NBD_FLAG_C_STRUCTURED_REPLY` only affects replies to
> + `NBD_CMD_READ`). This implies that any new commands that require
> + data in the reply should be gated by their own new global and client
> + flag. A server MAY refuse to allow a client that negotiates a
> + command that requires a structured reply, but does not also
> + negotiate `NBD_FLAG_C_STRUCTURED_REPLY`.
> +
> + In the generic form, the length field of a structured response MAY
> + be zero if there is no data payload, and the error field may be set
> + regardless of the length field (although where possible, the server
> + SHOULD use a length of zero when setting the error field). However,
> + particular commands may document additional restrictions regarding
> + what forms a valid response (for example, a structured response to
> + `NBD_CMD_READ` MUST NOT set the error field, and MUST have a
> + non-zero length of at least 9).
> +
> +* `NBD_CMD_READ`
> +
> + If `NBD_FLAG_C_STRUCTURED_REPLY` was not negotiated, then a read
> + request MUST always be answered by a single response (with magic
> + 0x67446698 `NBD_REPLY_MAGIC`); the response MUST include length
> + bytes of data according to the client's request, although those
> + bytes MAY be invalid if an error is returned, and the connection
> + MUST if an error occurs after a header claiming no error.
Word(s) apparently missing after "MUST" on last line. "MUST be
closed by the server" I think.
> +
> + Conversely, if the `NBD_FLAG_C_STRUCTURED_REPLY` was negotiated, the
> + response MUST be a sequence of zero or more structured replies (with
> + magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), followed by a
> + concluding normal response (0x67446698 `NBD_REPLY_MAGIC`), where the
> + final response MUST NOT have a data payload; all responses in the
> + sequence use the same handle from the client. The payload of each
> + intermediate structured reply, called a read chunk, MUST be of the
> + following form:
> +
> + S: 64 bits, offset (unsigned)
> + S: (*length* - 8) bytes of data
> +
> + Note that the amount of data returned in a read chunk is determined
> + by the length field of the structured reply, and not the original
> + length of the client's request. If the server sends a single read
> + chunk for a successful read, the server's length will be the
> + client's length plus 8, because the server must account for the
> + offset field in its reply. Similarly, a successful client request
> + for a read of 2^32-8 or more bytes MUST be split into at least two
> + read chunks, so that the length field does not suffer from overflow.
OK, so the error is not sent with the chunk (fine), but at the end.
> +
> + The server MUST ensure that each read chunk lies within the original
> + offset and length of the original client request, MUST NOT send read
> + chunks that would cover the same offset more than once, and MUST
> + send at least one byte of data in addition to the offset field of
> + each read chunk. The server MAY send read chunks out of order, and
> + may interleave other responses between read replies.
I can see why it's attractive from the server's point of view to
support out of order replies - for instance an NBD server with a back
end like Ceph could use this to launch requests simultaneously to
multiple backend stores.
However, theoretically a server can now send single one byte packets
back, which the client would have to reconstruct.
Also, given new commands aren't available unless you support structured
replies, you now have to support reassembly of replies (if you want
to use new features) even if all your reads are (e.g.) 1k.
Can I suggest that the client should be able to specify a minimum
'power of 2' chunk boundary, e.g. if the client says '1k', and its
requests do not cross a 1k boundary, the server will guarantee to
return them as a single chunk?
> The server
> + MUST NOT set the error field of a read chunk; if an error occurs, it
> + MAY immediately end the sequence of structured response messages,
> + MUST send the error in the concluding normal response, and SHOULD
> + keep the connection open. The final non-structured response MUST
> + set an error unless the sum of data sent by all read chunks totals
> + the original client length request.
add "and data for the entire range requested has been supplied." (I
know this is technically implied by the fact data cannot be duplicated).
> +
> + The client SHOULD immediately close the connection if it detects
> + that the server has sent an offset more than once (whether or not
> + the overlapping data claimed to have the same contents), or if
> + receives the concluding normal reply without an error set but
> + without all bytes covered by read chunk(s). A future extension may
> + add a command flag that would allow the server to skip read chunks
> + for portions of the file that read as all zeroes.
> +
> ## About this file
>
> This file tries to document the NBD protocol as it is currently
> --
> 2.5.5
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
> _______________________________________________
> Nbd-general mailing list
> Nbd-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nbd-general
>
--
Alex Bligh
Reply to: