[Nbd] [PATCH v3 5/5] RFC: doc: Promote structured reply out of experimental
- To: nbd-general@lists.sourceforge.net
- Cc: w@...112...
- Subject: [Nbd] [PATCH v3 5/5] RFC: doc: Promote structured reply out of experimental
- From: Eric Blake <eblake@...696...>
- Date: Thu, 31 Mar 2016 00:06:24 -0600
- Message-id: <1459404384-5258-6-git-send-email-eblake@...696...>
- In-reply-to: <1459404384-5258-1-git-send-email-eblake@...696...>
- References: <1459404384-5258-1-git-send-email-eblake@...696...>
Should not be applied until we have a working implementation,
in case we need to tweak things.
Demonstrates the amount of word-smithing required to promote
structured replies to non-experimental. In many cases, I was
able to preserve entire paragraphs (but sometimes reflowed at
different indentation).
Signed-off-by: Eric Blake <eblake@...696...>
---
doc/proto.md | 621 ++++++++++++++++++++++++++---------------------------------
1 file changed, 273 insertions(+), 348 deletions(-)
diff --git a/doc/proto.md b/doc/proto.md
index cd59d81..7bc65f8 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -182,29 +182,32 @@ required to.
### Transmission
-There are two message types in the transmission phase: the request,
-and the simple reply. The phase consists of a series of transactions,
-where the client submits requests and the server sends corresponding
-replies, with a single simple reply message per request, and continues
-until either side closes the connection.
+There are three message types in the transmission phase: the request,
+the simple reply, and the structured reply chunk. The phase consists
+of a series of transactions, where the client submits requests and the
+server sends corresponding replies, either a single simple reply or a
+series of one or more structured reply chunks delineated by a
+concluding flag. This phase continues until either side closes the
+connection.
Replies need not be sent in the same order as requests (i.e., requests
-may be handled by the server asynchronously). Clients SHOULD use a
-handle that is distinct from all other currently pending transactions,
-but MAY reuse handles that are no longer in flight; handles need not
-be consecutive. In each reply message, the server MUST use the same
-value for handle as was sent by the client in the corresponding
-request. In this way, the client can correlate which request is
-receiving a response.
+may be handled by the server asynchronously). Where a reply consists
+of multiple structured reply chunks, the intermediate chunks MAY be
+reordered within constraints documented by the request, and the chunks
+MAY be interleaved with messages from other pending transactions.
+Clients SHOULD use a handle that is distinct from all other currently
+pending transactions, but MAY reuse handles that are no longer in
+flight; handles need not be consecutive. In each reply message, the
+server MUST use the same value for handle as was sent by the client in
+the corresponding request. In this way, the client can correlate
+which request is receiving a response.
Note that it is impossible to tell by reading just the server traffic
whether a data field of a simple reply will be present; the simple
reply is also problematic for error handling of the `NBD_CMD_READ`
-request. Therefore, the experimental `STRUCTURED_REPLY` extension
-creates a context-free server stream by adding an additional
-structured reply type, and documents that it is possible to have
-multiple structured reply messages (called chunks) in response to a
-single request message; see below.
+request. Therefore, servers SHOULD support the structured reply
+extension, and "fixed newstyle" clients SHOULD use
+`NBD_OPT_STRUCTURED_REPLY` to negotiate structured replies.
#### Request message
@@ -245,6 +248,28 @@ S: 32 bits, error (MAY be zero)
S: 64 bits, handle
S: (*length* bytes of data if the request is of type `NBD_CMD_READ`)
+#### Structured reply message chunk
+
+ Unless explicitly documented for a given request, a structured reply
+ MUST occupy only one message (similar to a simple reply). However,
+ some requests document that a structured reply MAY occupy multiple
+ chunks; each chunk uses a structured reply message (all with the
+ same value for "handle"), and the `NBD_REPLY_FLAG_DONE` reply flag
+ is used to identify the final chunk.
+
+ A structured reply message looks as follows:
+
+ S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
+ S: 16 bits, flags
+ S: 16 bits, type
+ S: 64 bits, handle
+ S: 32 bits, length of payload (unsigned)
+ S: *length* bytes of payload data (if *length* is non-zero)
+
+ The use of *length* in the reply allows context-free division of the
+ overall server traffic into individual reply messages; the *type*
+ field describes how to further interpret the payload.
+
## Values
This section describes the value and meaning of constants (other than
@@ -288,8 +313,14 @@ immediately after the handshake flags field in oldstyle negotiation:
schedule I/O accesses as for a rotational medium
- bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
`NBD_CMD_TRIM` commands
-- bit 6, `NBD_FLAG_SEND_DF`; defined by the `STRUCTURED_REPLY` extension;
- see below.
+- bit 6, `NBD_FLAG_SEND_DF`; MUST be set to 1 if structured replies
+ have been negotiated, and MUST NOT be set otherwise; that way, the
+ client MAY reliably use this flag as a reliable witness of whether
+ to expect a simple reply or structured reply to the `NBD_CMD_READ`
+ transmission request.
+
+ Additionally, clients MUST NOT set the `NBD_CMD_FLAG_DF` request
+ flag unless this transmission flag is set.
Clients SHOULD ignore unknown flags.
@@ -380,7 +411,27 @@ of the newstyle negotiation.
- `NBD_OPT_STRUCTURED_REPLY` (8)
- Defined by the experimental `STRUCTURED_REPLY` extension; see below.
+ The client wishes to use structured replies during the
+ transmission phase. The option request has no additional data.
+
+ The server replies with the following:
+
+ - `NBD_REP_ACK`: Structured replies have been negotiated; the
+ server MUST set the `NBD_FLAG_SEND_DF` flag in all future
+ transmission flags, and MUST use structured replies to the
+ `NBD_CMD_READ` transmission request. Further extensions that
+ use structured replies may now be negotiated.
+ - For backwards compatibility, clients should be prepared to also
+ handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
+ will be sent.
+
+ It is envisioned that future extensions will add other new
+ requests that also require a data payload in the reply. Such
+ extensions MUST use a structured reply, and not a simple reply. A
+ server that supports such extensions MUST NOT advertise those
+ extensions until the client negotiates structured replies; and a
+ client MUST NOT make use of those extensions without first
+ enabling the `NBD_OPT_STRUCTURED_REPLY` extension.
#### Option reply types
@@ -481,8 +532,13 @@ valid may depend on negotiation during the handshake phase.
set to 1 if the client requires "Force Unit Access" mode of
operation. MUST NOT be set unless transmission flags included
`NBD_FLAG_SEND_FUA`.
-- bit 1, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
- extension; see below
+
+- bit 1, `NBD_CMD_FLAG_DF`; valid during `NBD_CMD_READ`. The "don't
+ fragment" bit. SHOULD be set to 1 if the client requires the server
+ to send at most one data chunk in reply. MUST NOT be set unless the
+ transmission flags include `NBD_FLAG_SEND_DF`. Use of this flag MAY
+ trigger an `EOVERFLOW` error chunk, if the request length is too
+ large.
#### Request types
@@ -490,10 +546,11 @@ The following request types exist:
* `NBD_CMD_READ` (0)
- A read request. Length and offset define the data to be read. The
- server MUST reply with a simple reply header, followed immediately
- by len bytes of data, read from offset bytes into the file, unless
- an error condition has occurred.
+ A read request. Length and offset define the data to be read. If
+ structured replies have not been negotiated, the server MUST reply
+ with a simple reply header, followed immediately by len bytes of
+ data, read from offset bytes into the file, unless an error
+ condition has occurred.
If an error occurs, the server SHOULD set the appropriate error code
in the error field. The server MUST then either close the
@@ -504,10 +561,79 @@ The following request types exist:
signalling no error), the server MUST immediately close the
connection; it MUST NOT send any further data to the client.
- The experimental `STRUCTURED_REPLY` extension changes from a
- simple reply to a structured reply, in part to allow recovery
- after a partial read and more efficient reads of sparse files; see
- below.
+ If structured replies are negotiated, then a read request MUST
+ result in a structured reply that MAY contain one or more chunks
+ (each using magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), with
+ the following additional constraints.
+
+ The server MAY split the reply into any number of data chunks
+ (reply types of `NBD_REPLY_TYPE_OFFSET_DATA` and
+ `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least
+ one byte, although to minimize overhead, the server SHOULD use
+ chunks where lengths and offsets are an integer multiple of 512
+ bytes, where possible (the first and last chunk of an unaligned
+ read being the most obvious place for an exception). The server
+ MUST NOT send data chunks that overlap each other or any earlier
+ error chunks, and MUST NOT send chunks that describe data outside
+ the offset and length of the request, but MAY send the chunks in
+ any order (the client MUST reassemble data chunks into the correct
+ order), and MAY send additional data chunks even after reporting
+ an error chunk. Note that a request for more than 2^32 - 8 bytes
+ MUST be split into at least two chunks, so as not to overflow the
+ length field of a reply while still allowing space for the offset
+ of each chunk. When no error is detected, the server MUST send
+ enough data chunks to cover the entire region described by the
+ offset and length of the client's request.
+
+ To minimize traffic, the server MAY set the `NBD_REPLY_FLAG_DONE`
+ on the final data chunk (in which case it MUST NOT send any
+ further non-data chunks), but MUST NOT do so if it would still be
+ possible to detect an error while transmitting the chunk. If the
+ last data chunk is not the final reply, the server MUST send a
+ final chunk with type `NBD_REPLY_TYPE_NONE` (and the flag
+ `NBD_REPLY_FLAG_DONE` set) to indicate success, or send an error
+ chunk.
+
+ If an error is detected, the server MUST still complete the
+ transmission of any current chunk (it SHOULD use padding bytes of
+ zero for any remaining data portion of
+ `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further data chunks.
+ The server MUST include an error chunk as one of the subsequent
+ chunks, but MAY defer the error reporting behind other queued
+ chunks. An error chunk of type `NBD_REPLY_TYPE_ERROR` implies
+ that the client MAY NOT make any assumptions about validity of
+ data chunks, and SHOULD either have `NBD_REPLY_FLAG_DONE` set as
+ the final chunk, or be immediately followed by a chunk of type
+ `NBD_REPLY_TYPE_NONE`. On the other hand, an error chunk of type
+ `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about
+ which earlier data chunk(s) encountered a failure, and MAY also be
+ sent in lieu of a data chunk; as such, a server MAY still usefully
+ follow it with further data chunks or further error offsets.
+ Generally, a server SHOULD NOT mix errors with offsets with a
+ generic error. As long as all errors are accompanied by offsets,
+ the client MAY assume that any data chunks with no subsequent
+ error are valid, that chunks with errors are valid up until the
+ reported offset, and portions of the read that do not have a
+ corresponding data chunk are not valid. If the final data or
+ error chunk did not have the `NBD_REPLY_FLAG_DONE` bit set, then
+ the server MUST use a final `NBD_REPLY_TYPE_NONE` chunk to
+ complete the reply, but the client MUST NOT treat this type as
+ success if an earlier data chunk was sent.
+
+ A client MAY close the connection if it detects that the server
+ has sent invalid chunks (such as overlapping data, or not enough
+ data before claiming success).
+
+ In order to avoid the burden of reassembly, the client MAY set the
+ `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not
+ fragment the reply. If this flag is set, the server MUST send at
+ most one data chunk, although it MAY still send multiple chunks
+ (the remaining chunks would be error chunks or a final type of
+ `NBD_REPLY_TYPE_NONE`). A server MAY reject a client's request
+ with the error `EOVERFLOW` if the length is too large to send
+ without fragmentation, in which case it MUST NOT send a data
+ chunk; however, the server MUST NOT use this error if the client's
+ requested length does not exceed 65,536 bytes.
* `NBD_CMD_WRITE` (1)
@@ -574,6 +700,114 @@ The following request types exist:
Currently one such message is known: `NBD_CMD_CACHE`, with type set to
5, implemented by xnbd.
+#### Structured reply flags
+
+ This field of 16 bits is sent by the server as part of every
+ structured reply.
+
+ - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if
+ more structured reply chunks will be sent for the same client
+ request, and MUST set this bit if this is the final reply. This
+ flag must always be set in response to requests which are
+ documented as using a structured reply, but not documented as
+ permitting multiple chunks.
+
+ The server MUST NOT set any other flags without first negotiating
+ the extension with the client. Clients that receive an
+ unrecognized flag SHOULD close the connection.
+
+#### Structured reply types
+
+ These values are used in the "type" field of a structured reply.
+ Each type determines how to interpret the "length" bytes of
+ payload. If the client receives an unknown or unexpected type, it
+ SHOULD close the connection.
+
+ - `NBD_REPLY_TYPE_NONE` (0)
+
+ *length* MUST be 0 (and the payload field omitted). This type
+ MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set
+ (that is, it is only useful as the final reply chunk). If no
+ earlier error chunks were sent, then this type implies that the
+ overall client request is successful.
+
+ [option #A1]
+ Valid as a reply to `NBD_CMD_READ`.
+
+ [option #A2]
+ Valid as a reply to any request.
+
+ - `NBD_REPLY_TYPE_ERROR` (1)
+
+ This reply type represents an error chunk. *length* MUST be
+ exactly 4. The payload is structured as:
+
+ 32 bits: error (MUST be nonzero)
+
+ This reply represents that an error occurred, and the client MAY
+ NOT make any assumptions about partial success. This type SHOULD
+ NOT be used unless it is the final reply chunk (where the flag
+ `NBD_REPLY_FLAG_DONE` is set), or if it is immediately followed
+ by a chunk with type `NBD_REPLY_TYPE_NONE`.
+
+ [option #A1]
+ Valid as a reply to `NBD_CMD_READ`.
+
+ [option #A2]
+ Valid as a reply to any request.
+
+ - `NBD_REPLY_TYPE_ERROR_OFFSET` (2)
+
+ This reply type represents an error chunk. *length* MUST be
+ exactly 12. The payload is structured as:
+
+ 32 bits: error (MUST be nonzero)
+ 64 bits: offset (unsigned)
+
+ In addition to declaring that an error occurred, this type
+ provides enough additional information to inform the client
+ about any partial success. *offset* MUST lie within the bounds
+ of the original offset and length of the client's request. If
+ *offset* also lies within the bounds of an earlier data chunk of
+ the same reply, then the client MAY assume that data within that
+ earlier chunk is valid (while the rest of that chunk MAY be
+ bogus). Any later data chunks of the same reply MUST NOT
+ contain the offset of this chunk.
+
+ Valid as a reply to `NBD_CMD_READ`.
+
+ - `NBD_REPLY_TYPE_OFFSET_DATA` (3)
+
+ This reply type represents a data chunk. *length* MUST be at
+ least 9. The payload is structured as:
+
+ 64 bits: offset (unsigned)
+ *length - 8* bytes: data
+
+ This reply represents the contents of *length - 8* bytes of the
+ file, starting at *offset*. The data MUST lie within the bounds
+ of the original offset and length of the client's request, and
+ MUST NOT overlap with any earlier data or error chunks of the
+ same reply.
+
+ Valid as a reply to `NBD_CMD_READ`.
+
+ - `NBD_REPLY_TYPE_OFFSET_HOLE` (4)
+
+ This reply type represents a data chunk. *length* MUST be
+ exactly 12. The payload is structured as:
+
+ 64 bits: offset (unsigned)
+ 32 bits: hole size (unsigned)
+
+ This reply represents that *hole size* bytes of the file (which
+ MUST be non-zero), starting at *offset*, read as all zeroes.
+ The hole MUST lie within the bounds of the original offset and
+ length of the client's request, and MUST NOT overlap with any
+ earlier data or error chunks of the same reply.
+
+ Valid as a reply to `NBD_CMD_READ`.
+
#### Error values
The error values are used for the error field in the reply message.
@@ -594,16 +828,22 @@ The following error values are defined:
* `ENOMEM` (12), Cannot allocate memory.
* `EINVAL` (22), Invalid argument.
* `ENOSPC` (28), No space left on device.
-* `EOVERFLOW` (75), Value too large; MUST NOT be sent outside of the
- experimental `STRUCTURED_REPLY` extension; see below.
+* `EOVERFLOW` (75), Value too large.
The server SHOULD return `ENOSPC` if it receives a write request
including one or more sectors beyond the size of the device. It SHOULD
return `EINVAL` if it receives a read or trim request including one or
more sectors beyond the size of the device. It also SHOULD map the
-`EDQUOT` and `EFBIG` errors to `ENOSPC`. Finally, it SHOULD return
+`EDQUOT` and `EFBIG` errors to `ENOSPC`. It SHOULD return
`EPERM` if it receives a write or trim request on a read-only export.
+The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a
+client has requested `NBD_CMD_FLAG_DF` for a length that is too large
+to read without fragmentation. The server SHOULD NOT return this error
+for a simple reply, MUST NOT return this on a read request that did
+not exceed 65,536 bytes, and SHOULD NOT return this error if
+`NBD_CMD_FLAG_DF` is not set.
+
The server SHOULD return `EINVAL` if it receives an unknown command.
The server SHOULD return `EINVAL` if it receives an unknown command flag. It
@@ -696,321 +936,6 @@ option reply type.
message if they do not also send it as a reply to the
`NBD_OPT_SELECT` message.
-### `STRUCTURED_REPLY` extension
-
-Some of the major downsides of the default simple reply to
-`NBD_CMD_READ` are as follows. First, it is not possible to support
-partial reads (the command must succeed or fail as a whole, either len
-bytes of data must be sent or the connection must be closed). There
-is no way to efficiently skip over portions of a sparse file that are
-known to contain all zeroes. Finally, it is not possible to reliably
-decode the server traffic without also having context of what pending
-read requests were sent by the client.
-
-To remedy this, a `STRUCTURED_REPLY` extension is envisioned. This
-extension adds a new option request, a new transmission flag, a new
-reply type during the transmission phase, a new command flag, a new
-command error, and alters the reply to the `NBD_CMD_READ` request.
-
-* `NBD_OPT_STRUCTURED_REPLY`
-
- The client wishes to use structured replies during the
- transmission phase. The option request has no additional data.
-
- The server replies with the following:
-
- - `NBD_REP_ACK`: Structured replies have been negotiated; the server
- MUST set the `NBD_FLAG_SEND_DF` flag in all future transmission
- flags, and MUST use structured replies to the `NBD_CMD_READ`
- transmission request. Further extensions that use structured
- replies may now be negotiated.
- - For backwards compatibility, clients should be prepared to also
- handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
- will be sent.
-
- It is envisioned that future extensions will add other new
- requests that also require a data payload in the reply. Such
- extensions MUST use a structured reply, and not a simple reply. A
- server that supports such extensions MUST NOT advertise those
- extensions until the client negotiates structured replies; and a
- client MUST NOT make use of those extensions without first
- enabling the `NBD_OPT_STRUCTURED_REPLY` extension.
-
-* `NBD_FLAG_SEND_DF`
-
- [option #B1 - transmission flags always mirror current state;
- state change can be observed if negotiation happens after
- NBD_OPT_LIST]
- The server MUST set this transmission flag to 1 if structured
- replies have been negotiated, and MUST NOT set this flag
- otherwise; that way, the client MAY reliably use this flag as a
- reliable witness of whether to expect a simple reply or structured
- reply to the `NBD_CMD_READ` transmission request.
-
- [option #B2 - final transmission flags are accurate, but
- intermediate transmission flags can anticipate negotiation; state
- change can be observed if negotiation does not happen]
- When responding to the `NBD_OPT_EXPORT_NAME` option request (or
- the `NBD_OPT_SELECT` request of the experimental `SELECT`
- extension), the server MUST set this transmission flag to 1 if
- structured replies have been negotiated, and MUST NOT set this
- flag otherwise; that way, the client MAY reliably use the final
- state of this flag as a reliable witness of whether to expect a
- simple reply or structured reply to the `NBD_CMD_READ`
- transmission request. When responding to the `NBD_OPT_LIST`
- option request, the server MAY set this transmission flag, even if
- structured replies have not yet been negotiated.
-
- [all options]
- Additionally, clients MUST NOT set the `NBD_CMD_FLAG_DF` request
- flag unless this transmission flag is set.
-
-* Transmission phase
-
- The transmission phase includes a third message type: the
- structured reply, to be used for commands where the response must
- include a data payload. The server MUST NOT send this reply type
- unless the client has successfully negotiated structured replies
- via `NBD_OPT_STRUCTURED_REPLY`. Conversely, the server MUST NOT
- use a simple reply for `NBD_CMD_READ` if structured replies are
- negotiated.
-
- [option #A1, but not #A2 or #A3]
- The server MUST NOT use structured replies for requests that never
- require a data payload in the response.
-
- Unless explicitly documented for a given request, a structured
- reply MUST occupy only one message (similar to a simple reply).
- However, some requests document that a structured reply MAY occupy
- multiple chunks; each chunk uses a structured reply message (all
- with the same value for "handle"), and the `NBD_REPLY_FLAG_DONE`
- reply flag is used to identify the final chunk. Where multiple
- chunks are permitted, the intermediate chunks MAY be reordered
- within constraints documented by the request, and the chunks MAY
- be interleaved with messages from other pending transactions; but
- the final chunk MUST always end the reply.
-
- A structured reply message looks as follows:
-
- S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
- S: 16 bits, flags
- S: 16 bits, type
- S: 64 bits, handle
- S: 32 bits, length of payload (unsigned)
- S: *length* bytes of payload data (if *length* is non-zero)
-
- The use of *length* in the reply allows context-free division of
- the overall server traffic into individual reply messages; the
- *type* field describes how to further interpret the payload.
-
- * Structured reply flags
-
- This field of 16 bits is sent by the server as part of every
- structured reply.
-
- - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if
- more structured reply chunks will be sent for the same client
- request, and MUST set this bit if this is the final reply. This
- flag must always be set in response to requests which are
- documented as using a structured reply, but not documented as
- permitting multiple chunks.
-
- The server MUST NOT set any other flags without first negotiating
- the extension with the client. Clients that receive an
- unrecognized flag SHOULD close the connection.
-
- * Structured Reply types
-
- These values are used in the "type" field of a structured reply.
- Each type determines how to interpret the "length" bytes of
- payload. If the client receives an unknown or unexpected type, it
- SHOULD close the connection.
-
- - `NBD_REPLY_TYPE_NONE` (0)
-
- *length* MUST be 0 (and the payload field omitted). This type
- MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set
- (that is, it is only useful as the final reply chunk). If no
- earlier error chunks were sent, then this type implies that the
- overall client request is successful.
-
- [option #A1]
- Valid as a reply to `NBD_CMD_READ`.
-
- [option #A2]
- Valid as a reply to any request.
-
- - `NBD_REPLY_TYPE_ERROR` (1)
-
- This reply type represents an error chunk. *length* MUST be
- exactly 4. The payload is structured as:
-
- 32 bits: error (MUST be nonzero)
-
- This reply represents that an error occurred, and the client MAY
- NOT make any assumptions about partial success. This type SHOULD
- NOT be used unless it is the final reply chunk (where the flag
- `NBD_REPLY_FLAG_DONE` is set), or if it is immediately followed
- by a chunk with type `NBD_REPLY_TYPE_NONE`.
-
- [option #A1]
- Valid as a reply to `NBD_CMD_READ`.
-
- [option #A2]
- Valid as a reply to any request.
-
- - `NBD_REPLY_TYPE_ERROR_OFFSET` (2)
-
- This reply type represents an error chunk. *length* MUST be
- exactly 12. The payload is structured as:
-
- 32 bits: error (MUST be nonzero)
- 64 bits: offset (unsigned)
-
- In addition to declaring that an error occurred, this type
- provides enough additional information to inform the client
- about any partial success. *offset* MUST lie within the bounds
- of the original offset and length of the client's request. If
- *offset* also lies within the bounds of an earlier data chunk of
- the same reply, then the client MAY assume that data within that
- earlier chunk is valid (while the rest of that chunk MAY be
- bogus). Any later data chunks of the same reply MUST NOT
- contain the offset of this chunk.
-
- Valid as a reply to `NBD_CMD_READ`.
-
- - `NBD_REPLY_TYPE_OFFSET_DATA` (3)
-
- This reply type represents a data chunk. *length* MUST be at
- least 9. The payload is structured as:
-
- 64 bits: offset (unsigned)
- *length - 8* bytes: data
-
- This reply represents the contents of *length - 8* bytes of the
- file, starting at *offset*. The data MUST lie within the bounds
- of the original offset and length of the client's request, and
- MUST NOT overlap with any earlier data or error chunks of the
- same reply.
-
- Valid as a reply to `NBD_CMD_READ`.
-
- - `NBD_REPLY_TYPE_OFFSET_HOLE` (4)
-
- This reply type represents a data chunk. *length* MUST be
- exactly 12. The payload is structured as:
-
- 64 bits: offset (unsigned)
- 32 bits: hole size (unsigned)
-
- This reply represents that *hole size* bytes of the file (which
- MUST be non-zero), starting at *offset*, read as all zeroes.
- The hole MUST lie within the bounds of the original offset and
- length of the client's request, and MUST NOT overlap with any
- earlier data or error chunks of the same reply.
-
- Valid as a reply to `NBD_CMD_READ`.
-
-* `NBD_CMD_FLAG_DF`
-
- The "don't fragment" bit, valid during `NBD_CMD_READ`. SHOULD be
- set to 1 if the client requires the server to send at most one
- data chunk in reply. MUST NOT be set unless the transmission
- flags include `NBD_FLAG_SEND_DF`. Use of this flag MAY trigger an
- `EOVERFLOW` error chunk, if the request length is too large.
-
-* `EOVERFLOW`
-
- The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a
- client has requested `NBD_CMD_FLAG_DF` for a length that is too
- large to read without fragmentation. The server MUST NOT return
- this error if the read request did not exceed 65,536 bytes, and
- SHOULD NOT return this error if `NBD_CMD_FLAG_DF` is not set.
-
-* `NBD_CMD_READ`
-
- If structured replies were not negotiated, then a read request
- MUST always be answered by a simple reply, as documented above
- (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
- length bytes of data according to the client's request, although
- those bytes MAY be invalid if an error is returned, and the
- connection MUST be closed if an error occurs after a header
- claiming no error).
-
- If structured replies are negotiated, then a read request MUST
- result in a structured reply that MAY contain one or more chunks
- (each using magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), with
- the following additional constraints.
-
- The server MAY split the reply into any number of data chunks
- (reply types of `NBD_REPLY_TYPE_OFFSET_DATA` and
- `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least
- one byte, although to minimize overhead, the server SHOULD use
- chunks where lengths and offsets are an integer multiple of 512
- bytes, where possible (the first and last chunk of an unaligned
- read being the most obvious place for an exception). The server
- MUST NOT send data chunks that overlap each other or any earlier
- error chunks, and MUST NOT send chunks that describe data outside
- the offset and length of the request, but MAY send the chunks in
- any order (the client MUST reassemble data chunks into the correct
- order), and MAY send additional data chunks even after reporting
- an error chunk. Note that a request for more than 2^32 - 8 bytes
- MUST be split into at least two chunks, so as not to overflow the
- length field of a reply while still allowing space for the offset
- of each chunk. When no error is detected, the server MUST send
- enough data chunks to cover the entire region described by the
- offset and length of the client's request.
-
- To minimize traffic, the server MAY set the `NBD_REPLY_FLAG_DONE`
- on the final data chunk (in which case it MUST NOT send any
- further non-data chunks), but MUST NOT do so if it would still be
- possible to detect an error while transmitting the chunk. If the
- last data chunk is not the final reply, the server MUST send a
- final chunk with type `NBD_REPLY_TYPE_NONE` (and the flag
- `NBD_REPLY_FLAG_DONE` set) to indicate success, or send an error
- chunk.
-
- If an error is detected, the server MUST still complete the
- transmission of any current chunk (it SHOULD use padding bytes of
- zero for any remaining data portion of
- `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further data chunks.
- The server MUST include an error chunk as one of the subsequent
- chunks, but MAY defer the error reporting behind other queued
- chunks. An error chunk of type `NBD_REPLY_TYPE_ERROR` implies
- that the client MAY NOT make any assumptions about validity of
- data chunks, and SHOULD either have `NBD_REPLY_FLAG_DONE` set as
- the final chunk, or be immediately followed by a chunk of type
- `NBD_REPLY_TYPE_NONE`. On the other hand, an error chunk of type
- `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about
- which earlier data chunk(s) encountered a failure, and MAY also be
- sent in lieu of a data chunk; as such, a server MAY still usefully
- follow it with further data chunks or further error offsets.
- Generally, a server SHOULD NOT mix errors with offsets with a
- generic error. As long as all errors are accompanied by offsets,
- the client MAY assume that any data chunks with no subsequent
- error are valid, that chunks with errors are valid up until the
- reported offset, and portions of the read that do not have a
- corresponding data chunk are not valid. If the final data or
- error chunk did not have the `NBD_REPLY_FLAG_DONE` bit set, then
- the server MUST use a final `NBD_REPLY_TYPE_NONE` chunk to
- complete the reply, but the client MUST NOT treat this type as
- success if an earlier data chunk was sent.
-
- A client MAY close the connection if it detects that the server
- has sent invalid chunks (such as overlapping data, or not enough
- data before claiming success).
-
- In order to avoid the burden of reassembly, the client MAY set the
- `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not
- fragment the reply. If this flag is set, the server MUST send at
- most one data chunk, although it MAY still send multiple chunks
- (the remaining chunks would be error chunks or a final type of
- `NBD_REPLY_TYPE_NONE`). A server MAY reject a client's request
- with the error `EOVERFLOW` if the length is too large to send
- without fragmentation, in which case it MUST NOT send a data
- chunk; however, the server MUST NOT use this if error the client's
- requested length does not exceed 65,536 bytes.
-
## About this file
This file tries to document the NBD protocol as it is currently
--
2.5.5
Reply to: