[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Nbd] [PATCH v3 5/5] RFC: doc: Promote structured reply out of experimental



Should not be applied until we have a working implementation,
in case we need to tweak things.

Demonstrates the amount of word-smithing required to promote
structured replies to non-experimental.  In many cases, I was
able to preserve entire paragraphs (but sometimes reflowed at
different indentation).

Signed-off-by: Eric Blake <eblake@...696...>
---
 doc/proto.md | 621 ++++++++++++++++++++++++++---------------------------------
 1 file changed, 273 insertions(+), 348 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index cd59d81..7bc65f8 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -182,29 +182,32 @@ required to.

 ### Transmission

-There are two message types in the transmission phase: the request,
-and the simple reply.  The phase consists of a series of transactions,
-where the client submits requests and the server sends corresponding
-replies, with a single simple reply message per request, and continues
-until either side closes the connection.
+There are three message types in the transmission phase: the request,
+the simple reply, and the structured reply chunk.  The phase consists
+of a series of transactions, where the client submits requests and the
+server sends corresponding replies, either a single simple reply or a
+series of one or more structured reply chunks delineated by a
+concluding flag.  This phase continues until either side closes the
+connection.

 Replies need not be sent in the same order as requests (i.e., requests
-may be handled by the server asynchronously).  Clients SHOULD use a
-handle that is distinct from all other currently pending transactions,
-but MAY reuse handles that are no longer in flight; handles need not
-be consecutive.  In each reply message, the server MUST use the same
-value for handle as was sent by the client in the corresponding
-request.  In this way, the client can correlate which request is
-receiving a response.
+may be handled by the server asynchronously).  Where a reply consists
+of multiple structured reply chunks, the intermediate chunks MAY be
+reordered within constraints documented by the request, and the chunks
+MAY be interleaved with messages from other pending transactions.
+Clients SHOULD use a handle that is distinct from all other currently
+pending transactions, but MAY reuse handles that are no longer in
+flight; handles need not be consecutive.  In each reply message, the
+server MUST use the same value for handle as was sent by the client in
+the corresponding request.  In this way, the client can correlate
+which request is receiving a response.

 Note that it is impossible to tell by reading just the server traffic
 whether a data field of a simple reply will be present; the simple
 reply is also problematic for error handling of the `NBD_CMD_READ`
-request.  Therefore, the experimental `STRUCTURED_REPLY` extension
-creates a context-free server stream by adding an additional
-structured reply type, and documents that it is possible to have
-multiple structured reply messages (called chunks) in response to a
-single request message; see below.
+request.  Therefore, servers SHOULD support the structured reply
+extension, and "fixed newstyle" clients SHOULD use
+`NBD_OPT_STRUCTURED_REPLY` to negotiate structured replies.

 #### Request message

@@ -245,6 +248,28 @@ S: 32 bits, error (MAY be zero)
 S: 64 bits, handle  
 S: (*length* bytes of data if the request is of type `NBD_CMD_READ`)  

+#### Structured reply message chunk
+
+  Unless explicitly documented for a given request, a structured reply
+  MUST occupy only one message (similar to a simple reply).  However,
+  some requests document that a structured reply MAY occupy multiple
+  chunks; each chunk uses a structured reply message (all with the
+  same value for "handle"), and the `NBD_REPLY_FLAG_DONE` reply flag
+  is used to identify the final chunk.
+
+  A structured reply message looks as follows:
+
+  S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
+  S: 16 bits, flags  
+  S: 16 bits, type  
+  S: 64 bits, handle  
+  S: 32 bits, length of payload (unsigned)  
+  S: *length* bytes of payload data (if *length* is non-zero)
+
+  The use of *length* in the reply allows context-free division of the
+  overall server traffic into individual reply messages; the *type*
+  field describes how to further interpret the payload.
+
 ## Values

 This section describes the value and meaning of constants (other than
@@ -288,8 +313,14 @@ immediately after the handshake flags field in oldstyle negotiation:
   schedule I/O accesses as for a rotational medium
 - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
   `NBD_CMD_TRIM` commands
-- bit 6, `NBD_FLAG_SEND_DF`; defined by the `STRUCTURED_REPLY` extension;
-  see below.
+- bit 6, `NBD_FLAG_SEND_DF`; MUST be set to 1 if structured replies
+  have been negotiated, and MUST NOT be set otherwise; that way, the
+  client MAY reliably use this flag as a reliable witness of whether
+  to expect a simple reply or structured reply to the `NBD_CMD_READ`
+  transmission request.
+
+  Additionally, clients MUST NOT set the `NBD_CMD_FLAG_DF` request
+  flag unless this transmission flag is set.

 Clients SHOULD ignore unknown flags.

@@ -380,7 +411,27 @@ of the newstyle negotiation.

 - `NBD_OPT_STRUCTURED_REPLY` (8)

-    Defined by the experimental `STRUCTURED_REPLY` extension; see below.
+    The client wishes to use structured replies during the
+    transmission phase.  The option request has no additional data.
+
+    The server replies with the following:
+
+    - `NBD_REP_ACK`: Structured replies have been negotiated; the
+      server MUST set the `NBD_FLAG_SEND_DF` flag in all future
+      transmission flags, and MUST use structured replies to the
+      `NBD_CMD_READ` transmission request.  Further extensions that
+      use structured replies may now be negotiated.
+    - For backwards compatibility, clients should be prepared to also
+      handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
+      will be sent.
+
+    It is envisioned that future extensions will add other new
+    requests that also require a data payload in the reply.  Such
+    extensions MUST use a structured reply, and not a simple reply.  A
+    server that supports such extensions MUST NOT advertise those
+    extensions until the client negotiates structured replies; and a
+    client MUST NOT make use of those extensions without first
+    enabling the `NBD_OPT_STRUCTURED_REPLY` extension.

 #### Option reply types

@@ -481,8 +532,13 @@ valid may depend on negotiation during the handshake phase.
   set to 1 if the client requires "Force Unit Access" mode of
   operation.  MUST NOT be set unless transmission flags included
   `NBD_FLAG_SEND_FUA`.
-- bit 1, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
-  extension; see below
+
+- bit 1, `NBD_CMD_FLAG_DF`; valid during `NBD_CMD_READ`.  The "don't
+  fragment" bit.  SHOULD be set to 1 if the client requires the server
+  to send at most one data chunk in reply.  MUST NOT be set unless the
+  transmission flags include `NBD_FLAG_SEND_DF`.  Use of this flag MAY
+  trigger an `EOVERFLOW` error chunk, if the request length is too
+  large.

 #### Request types

@@ -490,10 +546,11 @@ The following request types exist:

 * `NBD_CMD_READ` (0)

-    A read request. Length and offset define the data to be read. The
-    server MUST reply with a simple reply header, followed immediately
-    by len bytes of data, read from offset bytes into the file, unless
-    an error condition has occurred.
+    A read request. Length and offset define the data to be read. If
+    structured replies have not been negotiated, the server MUST reply
+    with a simple reply header, followed immediately by len bytes of
+    data, read from offset bytes into the file, unless an error
+    condition has occurred.

     If an error occurs, the server SHOULD set the appropriate error code
     in the error field. The server MUST then either close the
@@ -504,10 +561,79 @@ The following request types exist:
     signalling no error), the server MUST immediately close the
     connection; it MUST NOT send any further data to the client.

-    The experimental `STRUCTURED_REPLY` extension changes from a
-    simple reply to a structured reply, in part to allow recovery
-    after a partial read and more efficient reads of sparse files; see
-    below.
+    If structured replies are negotiated, then a read request MUST
+    result in a structured reply that MAY contain one or more chunks
+    (each using magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), with
+    the following additional constraints.
+
+    The server MAY split the reply into any number of data chunks
+    (reply types of `NBD_REPLY_TYPE_OFFSET_DATA` and
+    `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least
+    one byte, although to minimize overhead, the server SHOULD use
+    chunks where lengths and offsets are an integer multiple of 512
+    bytes, where possible (the first and last chunk of an unaligned
+    read being the most obvious place for an exception).  The server
+    MUST NOT send data chunks that overlap each other or any earlier
+    error chunks, and MUST NOT send chunks that describe data outside
+    the offset and length of the request, but MAY send the chunks in
+    any order (the client MUST reassemble data chunks into the correct
+    order), and MAY send additional data chunks even after reporting
+    an error chunk.  Note that a request for more than 2^32 - 8 bytes
+    MUST be split into at least two chunks, so as not to overflow the
+    length field of a reply while still allowing space for the offset
+    of each chunk.  When no error is detected, the server MUST send
+    enough data chunks to cover the entire region described by the
+    offset and length of the client's request.
+
+    To minimize traffic, the server MAY set the `NBD_REPLY_FLAG_DONE`
+    on the final data chunk (in which case it MUST NOT send any
+    further non-data chunks), but MUST NOT do so if it would still be
+    possible to detect an error while transmitting the chunk.  If the
+    last data chunk is not the final reply, the server MUST send a
+    final chunk with type `NBD_REPLY_TYPE_NONE` (and the flag
+    `NBD_REPLY_FLAG_DONE` set) to indicate success, or send an error
+    chunk.
+
+    If an error is detected, the server MUST still complete the
+    transmission of any current chunk (it SHOULD use padding bytes of
+    zero for any remaining data portion of
+    `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further data chunks.
+    The server MUST include an error chunk as one of the subsequent
+    chunks, but MAY defer the error reporting behind other queued
+    chunks.  An error chunk of type `NBD_REPLY_TYPE_ERROR` implies
+    that the client MAY NOT make any assumptions about validity of
+    data chunks, and SHOULD either have `NBD_REPLY_FLAG_DONE` set as
+    the final chunk, or be immediately followed by a chunk of type
+    `NBD_REPLY_TYPE_NONE`.  On the other hand, an error chunk of type
+    `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about
+    which earlier data chunk(s) encountered a failure, and MAY also be
+    sent in lieu of a data chunk; as such, a server MAY still usefully
+    follow it with further data chunks or further error offsets.
+    Generally, a server SHOULD NOT mix errors with offsets with a
+    generic error.  As long as all errors are accompanied by offsets,
+    the client MAY assume that any data chunks with no subsequent
+    error are valid, that chunks with errors are valid up until the
+    reported offset, and portions of the read that do not have a
+    corresponding data chunk are not valid.  If the final data or
+    error chunk did not have the `NBD_REPLY_FLAG_DONE` bit set, then
+    the server MUST use a final `NBD_REPLY_TYPE_NONE` chunk to
+    complete the reply, but the client MUST NOT treat this type as
+    success if an earlier data chunk was sent.
+
+    A client MAY close the connection if it detects that the server
+    has sent invalid chunks (such as overlapping data, or not enough
+    data before claiming success).
+
+    In order to avoid the burden of reassembly, the client MAY set the
+    `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not
+    fragment the reply.  If this flag is set, the server MUST send at
+    most one data chunk, although it MAY still send multiple chunks
+    (the remaining chunks would be error chunks or a final type of
+    `NBD_REPLY_TYPE_NONE`).  A server MAY reject a client's request
+    with the error `EOVERFLOW` if the length is too large to send
+    without fragmentation, in which case it MUST NOT send a data
+    chunk; however, the server MUST NOT use this error if the client's
+    requested length does not exceed 65,536 bytes.

 * `NBD_CMD_WRITE` (1)

@@ -574,6 +700,114 @@ The following request types exist:
     Currently one such message is known: `NBD_CMD_CACHE`, with type set to
     5, implemented by xnbd.

+#### Structured reply flags
+
+    This field of 16 bits is sent by the server as part of every
+    structured reply.
+
+    - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if
+      more structured reply chunks will be sent for the same client
+      request, and MUST set this bit if this is the final reply.  This
+      flag must always be set in response to requests which are
+      documented as using a structured reply, but not documented as
+      permitting multiple chunks.
+
+    The server MUST NOT set any other flags without first negotiating
+    the extension with the client.  Clients that receive an
+    unrecognized flag SHOULD close the connection.
+
+#### Structured reply types
+
+    These values are used in the "type" field of a structured reply.
+    Each type determines how to interpret the "length" bytes of
+    payload.  If the client receives an unknown or unexpected type, it
+    SHOULD close the connection.
+
+    - `NBD_REPLY_TYPE_NONE` (0)
+
+      *length* MUST be 0 (and the payload field omitted).  This type
+       MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set
+       (that is, it is only useful as the final reply chunk).  If no
+       earlier error chunks were sent, then this type implies that the
+       overall client request is successful.
+
+      [option #A1]
+      Valid as a reply to `NBD_CMD_READ`.
+
+      [option #A2]
+      Valid as a reply to any request.
+
+    - `NBD_REPLY_TYPE_ERROR` (1)
+
+      This reply type represents an error chunk.  *length* MUST be
+      exactly 4.  The payload is structured as:
+
+      32 bits: error (MUST be nonzero)  
+
+      This reply represents that an error occurred, and the client MAY
+      NOT make any assumptions about partial success. This type SHOULD
+      NOT be used unless it is the final reply chunk (where the flag
+      `NBD_REPLY_FLAG_DONE` is set), or if it is immediately followed
+      by a chunk with type `NBD_REPLY_TYPE_NONE`.
+
+      [option #A1]
+      Valid as a reply to `NBD_CMD_READ`.
+
+      [option #A2]
+      Valid as a reply to any request.
+
+    - `NBD_REPLY_TYPE_ERROR_OFFSET` (2)
+
+      This reply type represents an error chunk.  *length* MUST be
+      exactly 12.  The payload is structured as:
+
+      32 bits: error (MUST be nonzero)  
+      64 bits: offset (unsigned)  
+
+      In addition to declaring that an error occurred, this type
+      provides enough additional information to inform the client
+      about any partial success.  *offset* MUST lie within the bounds
+      of the original offset and length of the client's request.  If
+      *offset* also lies within the bounds of an earlier data chunk of
+      the same reply, then the client MAY assume that data within that
+      earlier chunk is valid (while the rest of that chunk MAY be
+      bogus).  Any later data chunks of the same reply MUST NOT
+      contain the offset of this chunk.
+
+      Valid as a reply to `NBD_CMD_READ`.
+
+    - `NBD_REPLY_TYPE_OFFSET_DATA` (3)
+
+      This reply type represents a data chunk.  *length* MUST be at
+      least 9.  The payload is structured as:
+
+      64 bits: offset (unsigned)  
+      *length - 8* bytes: data  
+
+      This reply represents the contents of *length - 8* bytes of the
+      file, starting at *offset*.  The data MUST lie within the bounds
+      of the original offset and length of the client's request, and
+      MUST NOT overlap with any earlier data or error chunks of the
+      same reply.
+
+      Valid as a reply to `NBD_CMD_READ`.
+
+    - `NBD_REPLY_TYPE_OFFSET_HOLE` (4)
+
+      This reply type represents a data chunk.  *length* MUST be
+      exactly 12.  The payload is structured as:
+
+      64 bits: offset (unsigned)  
+      32 bits: hole size (unsigned)  
+
+      This reply represents that *hole size* bytes of the file (which
+      MUST be non-zero), starting at *offset*, read as all zeroes.
+      The hole MUST lie within the bounds of the original offset and
+      length of the client's request, and MUST NOT overlap with any
+      earlier data or error chunks of the same reply.
+
+      Valid as a reply to `NBD_CMD_READ`.
+
 #### Error values

 The error values are used for the error field in the reply message.
@@ -594,16 +828,22 @@ The following error values are defined:
 * `ENOMEM` (12), Cannot allocate memory.
 * `EINVAL` (22), Invalid argument.
 * `ENOSPC` (28), No space left on device.
-* `EOVERFLOW` (75), Value too large; MUST NOT be sent outside of the
-  experimental `STRUCTURED_REPLY` extension; see below.
+* `EOVERFLOW` (75), Value too large.

 The server SHOULD return `ENOSPC` if it receives a write request
 including one or more sectors beyond the size of the device.  It SHOULD
 return `EINVAL` if it receives a read or trim request including one or
 more sectors beyond the size of the device.  It also SHOULD map the
-`EDQUOT` and `EFBIG` errors to `ENOSPC`.  Finally, it SHOULD return
+`EDQUOT` and `EFBIG` errors to `ENOSPC`.  It SHOULD return
 `EPERM` if it receives a write or trim request on a read-only export.

+The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a
+client has requested `NBD_CMD_FLAG_DF` for a length that is too large
+to read without fragmentation.  The server SHOULD NOT return this error
+for a simple reply, MUST NOT return this on a read request that did
+not exceed 65,536 bytes, and SHOULD NOT return this error if
+`NBD_CMD_FLAG_DF` is not set.
+
 The server SHOULD return `EINVAL` if it receives an unknown command.

 The server SHOULD return `EINVAL` if it receives an unknown command flag. It
@@ -696,321 +936,6 @@ option reply type.
       message if they do not also send it as a reply to the
       `NBD_OPT_SELECT` message.

-### `STRUCTURED_REPLY` extension
-
-Some of the major downsides of the default simple reply to
-`NBD_CMD_READ` are as follows.  First, it is not possible to support
-partial reads (the command must succeed or fail as a whole, either len
-bytes of data must be sent or the connection must be closed).  There
-is no way to efficiently skip over portions of a sparse file that are
-known to contain all zeroes.  Finally, it is not possible to reliably
-decode the server traffic without also having context of what pending
-read requests were sent by the client.
-
-To remedy this, a `STRUCTURED_REPLY` extension is envisioned. This
-extension adds a new option request, a new transmission flag, a new
-reply type during the transmission phase, a new command flag, a new
-command error, and alters the reply to the `NBD_CMD_READ` request.
-
-* `NBD_OPT_STRUCTURED_REPLY`
-
-    The client wishes to use structured replies during the
-    transmission phase.  The option request has no additional data.
-
-    The server replies with the following:
-
-    - `NBD_REP_ACK`: Structured replies have been negotiated; the server
-      MUST set the `NBD_FLAG_SEND_DF` flag in all future transmission
-      flags, and MUST use structured replies to the `NBD_CMD_READ`
-      transmission request.  Further extensions that use structured
-      replies may now be negotiated.
-    - For backwards compatibility, clients should be prepared to also
-      handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
-      will be sent.
-
-    It is envisioned that future extensions will add other new
-    requests that also require a data payload in the reply.  Such
-    extensions MUST use a structured reply, and not a simple reply.  A
-    server that supports such extensions MUST NOT advertise those
-    extensions until the client negotiates structured replies; and a
-    client MUST NOT make use of those extensions without first
-    enabling the `NBD_OPT_STRUCTURED_REPLY` extension.
-
-* `NBD_FLAG_SEND_DF`
-
-    [option #B1 - transmission flags always mirror current state;
-    state change can be observed if negotiation happens after
-    NBD_OPT_LIST]
-    The server MUST set this transmission flag to 1 if structured
-    replies have been negotiated, and MUST NOT set this flag
-    otherwise; that way, the client MAY reliably use this flag as a
-    reliable witness of whether to expect a simple reply or structured
-    reply to the `NBD_CMD_READ` transmission request.
-
-    [option #B2 - final transmission flags are accurate, but
-    intermediate transmission flags can anticipate negotiation; state
-    change can be observed if negotiation does not happen]
-    When responding to the `NBD_OPT_EXPORT_NAME` option request (or
-    the `NBD_OPT_SELECT` request of the experimental `SELECT`
-    extension), the server MUST set this transmission flag to 1 if
-    structured replies have been negotiated, and MUST NOT set this
-    flag otherwise; that way, the client MAY reliably use the final
-    state of this flag as a reliable witness of whether to expect a
-    simple reply or structured reply to the `NBD_CMD_READ`
-    transmission request.  When responding to the `NBD_OPT_LIST`
-    option request, the server MAY set this transmission flag, even if
-    structured replies have not yet been negotiated.
-
-    [all options]
-    Additionally, clients MUST NOT set the `NBD_CMD_FLAG_DF` request
-    flag unless this transmission flag is set.
-
-* Transmission phase
-
-    The transmission phase includes a third message type: the
-    structured reply, to be used for commands where the response must
-    include a data payload.  The server MUST NOT send this reply type
-    unless the client has successfully negotiated structured replies
-    via `NBD_OPT_STRUCTURED_REPLY`.  Conversely, the server MUST NOT
-    use a simple reply for `NBD_CMD_READ` if structured replies are
-    negotiated.
-
-    [option #A1, but not #A2 or #A3]
-    The server MUST NOT use structured replies for requests that never
-    require a data payload in the response.
-
-    Unless explicitly documented for a given request, a structured
-    reply MUST occupy only one message (similar to a simple reply).
-    However, some requests document that a structured reply MAY occupy
-    multiple chunks; each chunk uses a structured reply message (all
-    with the same value for "handle"), and the `NBD_REPLY_FLAG_DONE`
-    reply flag is used to identify the final chunk.  Where multiple
-    chunks are permitted, the intermediate chunks MAY be reordered
-    within constraints documented by the request, and the chunks MAY
-    be interleaved with messages from other pending transactions; but
-    the final chunk MUST always end the reply.
-
-    A structured reply message looks as follows:
-
-    S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
-    S: 16 bits, flags  
-    S: 16 bits, type  
-    S: 64 bits, handle  
-    S: 32 bits, length of payload (unsigned)  
-    S: *length* bytes of payload data (if *length* is non-zero)
-
-    The use of *length* in the reply allows context-free division of
-    the overall server traffic into individual reply messages; the
-    *type* field describes how to further interpret the payload.
-
-  * Structured reply flags
-
-    This field of 16 bits is sent by the server as part of every
-    structured reply.
-
-    - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if
-      more structured reply chunks will be sent for the same client
-      request, and MUST set this bit if this is the final reply.  This
-      flag must always be set in response to requests which are
-      documented as using a structured reply, but not documented as
-      permitting multiple chunks.
-
-    The server MUST NOT set any other flags without first negotiating
-    the extension with the client.  Clients that receive an
-    unrecognized flag SHOULD close the connection.
-
-  * Structured Reply types
-
-    These values are used in the "type" field of a structured reply.
-    Each type determines how to interpret the "length" bytes of
-    payload.  If the client receives an unknown or unexpected type, it
-    SHOULD close the connection.
-
-    - `NBD_REPLY_TYPE_NONE` (0)
-
-      *length* MUST be 0 (and the payload field omitted).  This type
-       MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set
-       (that is, it is only useful as the final reply chunk).  If no
-       earlier error chunks were sent, then this type implies that the
-       overall client request is successful.
-
-      [option #A1]
-      Valid as a reply to `NBD_CMD_READ`.
-
-      [option #A2]
-      Valid as a reply to any request.
-
-    - `NBD_REPLY_TYPE_ERROR` (1)
-
-      This reply type represents an error chunk.  *length* MUST be
-      exactly 4.  The payload is structured as:
-
-      32 bits: error (MUST be nonzero)  
-
-      This reply represents that an error occurred, and the client MAY
-      NOT make any assumptions about partial success. This type SHOULD
-      NOT be used unless it is the final reply chunk (where the flag
-      `NBD_REPLY_FLAG_DONE` is set), or if it is immediately followed
-      by a chunk with type `NBD_REPLY_TYPE_NONE`.
-
-      [option #A1]
-      Valid as a reply to `NBD_CMD_READ`.
-
-      [option #A2]
-      Valid as a reply to any request.
-
-    - `NBD_REPLY_TYPE_ERROR_OFFSET` (2)
-
-      This reply type represents an error chunk.  *length* MUST be
-      exactly 12.  The payload is structured as:
-
-      32 bits: error (MUST be nonzero)  
-      64 bits: offset (unsigned)  
-
-      In addition to declaring that an error occurred, this type
-      provides enough additional information to inform the client
-      about any partial success.  *offset* MUST lie within the bounds
-      of the original offset and length of the client's request.  If
-      *offset* also lies within the bounds of an earlier data chunk of
-      the same reply, then the client MAY assume that data within that
-      earlier chunk is valid (while the rest of that chunk MAY be
-      bogus).  Any later data chunks of the same reply MUST NOT
-      contain the offset of this chunk.
-
-      Valid as a reply to `NBD_CMD_READ`.
-
-    - `NBD_REPLY_TYPE_OFFSET_DATA` (3)
-
-      This reply type represents a data chunk.  *length* MUST be at
-      least 9.  The payload is structured as:
-
-      64 bits: offset (unsigned)  
-      *length - 8* bytes: data  
-
-      This reply represents the contents of *length - 8* bytes of the
-      file, starting at *offset*.  The data MUST lie within the bounds
-      of the original offset and length of the client's request, and
-      MUST NOT overlap with any earlier data or error chunks of the
-      same reply.
-
-      Valid as a reply to `NBD_CMD_READ`.
-
-    - `NBD_REPLY_TYPE_OFFSET_HOLE` (4)
-
-      This reply type represents a data chunk.  *length* MUST be
-      exactly 12.  The payload is structured as:
-
-      64 bits: offset (unsigned)  
-      32 bits: hole size (unsigned)  
-
-      This reply represents that *hole size* bytes of the file (which
-      MUST be non-zero), starting at *offset*, read as all zeroes.
-      The hole MUST lie within the bounds of the original offset and
-      length of the client's request, and MUST NOT overlap with any
-      earlier data or error chunks of the same reply.
-
-      Valid as a reply to `NBD_CMD_READ`.
-
-* `NBD_CMD_FLAG_DF`
-
-    The "don't fragment" bit, valid during `NBD_CMD_READ`.  SHOULD be
-    set to 1 if the client requires the server to send at most one
-    data chunk in reply.  MUST NOT be set unless the transmission
-    flags include `NBD_FLAG_SEND_DF`.  Use of this flag MAY trigger an
-    `EOVERFLOW` error chunk, if the request length is too large.
-
-* `EOVERFLOW`
-
-    The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a
-    client has requested `NBD_CMD_FLAG_DF` for a length that is too
-    large to read without fragmentation.  The server MUST NOT return
-    this error if the read request did not exceed 65,536 bytes, and
-    SHOULD NOT return this error if `NBD_CMD_FLAG_DF` is not set.
-
-* `NBD_CMD_READ`
-
-    If structured replies were not negotiated, then a read request
-    MUST always be answered by a simple reply, as documented above
-    (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
-    length bytes of data according to the client's request, although
-    those bytes MAY be invalid if an error is returned, and the
-    connection MUST be closed if an error occurs after a header
-    claiming no error).
-
-    If structured replies are negotiated, then a read request MUST
-    result in a structured reply that MAY contain one or more chunks
-    (each using magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), with
-    the following additional constraints.
-
-    The server MAY split the reply into any number of data chunks
-    (reply types of `NBD_REPLY_TYPE_OFFSET_DATA` and
-    `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least
-    one byte, although to minimize overhead, the server SHOULD use
-    chunks where lengths and offsets are an integer multiple of 512
-    bytes, where possible (the first and last chunk of an unaligned
-    read being the most obvious place for an exception).  The server
-    MUST NOT send data chunks that overlap each other or any earlier
-    error chunks, and MUST NOT send chunks that describe data outside
-    the offset and length of the request, but MAY send the chunks in
-    any order (the client MUST reassemble data chunks into the correct
-    order), and MAY send additional data chunks even after reporting
-    an error chunk.  Note that a request for more than 2^32 - 8 bytes
-    MUST be split into at least two chunks, so as not to overflow the
-    length field of a reply while still allowing space for the offset
-    of each chunk.  When no error is detected, the server MUST send
-    enough data chunks to cover the entire region described by the
-    offset and length of the client's request.
-
-    To minimize traffic, the server MAY set the `NBD_REPLY_FLAG_DONE`
-    on the final data chunk (in which case it MUST NOT send any
-    further non-data chunks), but MUST NOT do so if it would still be
-    possible to detect an error while transmitting the chunk.  If the
-    last data chunk is not the final reply, the server MUST send a
-    final chunk with type `NBD_REPLY_TYPE_NONE` (and the flag
-    `NBD_REPLY_FLAG_DONE` set) to indicate success, or send an error
-    chunk.
-
-    If an error is detected, the server MUST still complete the
-    transmission of any current chunk (it SHOULD use padding bytes of
-    zero for any remaining data portion of
-    `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further data chunks.
-    The server MUST include an error chunk as one of the subsequent
-    chunks, but MAY defer the error reporting behind other queued
-    chunks.  An error chunk of type `NBD_REPLY_TYPE_ERROR` implies
-    that the client MAY NOT make any assumptions about validity of
-    data chunks, and SHOULD either have `NBD_REPLY_FLAG_DONE` set as
-    the final chunk, or be immediately followed by a chunk of type
-    `NBD_REPLY_TYPE_NONE`.  On the other hand, an error chunk of type
-    `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about
-    which earlier data chunk(s) encountered a failure, and MAY also be
-    sent in lieu of a data chunk; as such, a server MAY still usefully
-    follow it with further data chunks or further error offsets.
-    Generally, a server SHOULD NOT mix errors with offsets with a
-    generic error.  As long as all errors are accompanied by offsets,
-    the client MAY assume that any data chunks with no subsequent
-    error are valid, that chunks with errors are valid up until the
-    reported offset, and portions of the read that do not have a
-    corresponding data chunk are not valid.  If the final data or
-    error chunk did not have the `NBD_REPLY_FLAG_DONE` bit set, then
-    the server MUST use a final `NBD_REPLY_TYPE_NONE` chunk to
-    complete the reply, but the client MUST NOT treat this type as
-    success if an earlier data chunk was sent.
-
-    A client MAY close the connection if it detects that the server
-    has sent invalid chunks (such as overlapping data, or not enough
-    data before claiming success).
-
-    In order to avoid the burden of reassembly, the client MAY set the
-    `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not
-    fragment the reply.  If this flag is set, the server MUST send at
-    most one data chunk, although it MAY still send multiple chunks
-    (the remaining chunks would be error chunks or a final type of
-    `NBD_REPLY_TYPE_NONE`).  A server MAY reject a client's request
-    with the error `EOVERFLOW` if the length is too large to send
-    without fragmentation, in which case it MUST NOT send a data
-    chunk; however, the server MUST NOT use this if error the client's
-    requested length does not exceed 65,536 bytes.
-
 ## About this file

 This file tries to document the NBD protocol as it is currently
-- 
2.5.5




Reply to: