[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Nbd] [RFC PATCH] doc: Add new NBD_FLAG_BLOCK_SIZE extension



Existing NBD servers often have limitations, such as requiring
actions to be aligned to block sizes or limiting maximum
transactions to avoid denial of service attacks; for example,
qemu's NBD server refuses any transaction larger than 32M.  But
to date, clients have to learn these limitations via out-of-band
means.

This adds a new negotiation flag to let client and server agree
to alter the response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO to
advertise the server's limitations, as well as documenting sane
defaults that a client should follow if the information was not
documented.

This is done as a global flag rather than a new NBD_OPT, for a
couple of reasons.  First, block size information can be
implemented even without support for NBD_FLAG_FIXED_NEWSTYLE,
because of the reserved block of zeroes sent in response to
NBD_CMD_EXPORT_NAME.  Second, negotiating the option changes
the NBD_REP_SERVER message size sent by the server for
NBD_OPT_INFO; although this reply is structured and the length
will be reliable, it would still be awkward for out-of-order
option replies from the server to ping-pong in size within a
single session.  Making the negotiation part of the initial
global flag handshake means the rest of the session has
consistent message sizing.

Signed-off-by: Eric Blake <eblake@...696...>
---

Perhaps I should try harder to make the an experimental extension,
rather than directly placing it into the normative section.  But
I wanted to first get it out for an initial review on whether the
ideas presented make sense.


Requires these patches to be applied first:
[AB] [PATCHv8] Improve documentation for TLS
[AB*] [PATCHv6] docs/proto.md: Clarify SHOULD / MUST / MAY etc
[EB] [PATCHv2] doc: In STRUCTURED_REPLY, make error types easy to recognize

 doc/proto.md | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 125 insertions(+), 22 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 06a275d..f721db9 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -143,12 +143,24 @@ understand.

 If the value of the option field is `NBD_OPT_EXPORT_NAME` and the server
 is willing to allow the export, the server replies with information
-about the used export:
+about the used export.  The amount of information sent is determined
+by what client flags were succesfully negotiated (134 bytes if
+`NBD_FLAG_C_NO_ZEROES` is clear, otherwise 22 bytes if
+`NBD_FLAG_C_BLOCK_SIZE` is set, otherwise 10 bytes).

 S: 64 bits, size of the export in bytes (unsigned)  
 S: 16 bits, transmission flags  
-S: 124 bytes, zeroes (reserved) (unless `NBD_FLAG_C_NO_ZEROES` was
-   negotiated by the client)  
+S: 32 bits, minimum block size (valid if `NBD_FLAG_C_BLOCK_SIZE`
+   was negotiated, otherwise omitted if `NBD_FLAG_C_NO_ZEROES` was
+   negotiated, otherwise zero)  
+S: 32 bits, preferred block size (valid if `NBD_FLAG_C_BLOCK_SIZE`
+   was negotiated, otherwise omitted if `NBD_FLAG_C_NO_ZEROES` was
+   negotiated, otherwise zero)  
+S: 32 bits, maximum block size (valid if `NBD_FLAG_C_BLOCK_SIZE`
+   was negotiated, otherwise omitted if `NBD_FLAG_C_NO_ZEROES` was
+   negotiated, otherwise zero)  
+S: 112 bytes, zeroes (reserved) (omitted `NBD_FLAG_C_NO_ZEROES` was
+   negotiated)  

 If the server is unwilling to allow the export, it SHOULD close the
 connection.
@@ -162,6 +174,40 @@ bit, signalling that an extra flag field will follow, to which the
 client will have to reply with a flag field of its own before the extra
 flags are sent. This is not yet implemented.

+The three block size fields represent constraints that the client
+SHOULD obey during transmission phase, as further detailed in
+individual transmission requests.  If `NBD_FLAG_C_BLOCK_SIZE` was not
+negotiated, the server MUST use zero for all three fields (or omit
+them, when `NBD_FLAG_C_NO_ZEROES` is negotiated), and the client
+SHOULD assume a minimum block size of 512 bytes, a maximum block size
+of 33554432 (32M) bytes (unless the export size is smaller), and no
+hint on a preferred block size.  Otherwise, the three values MUST each
+be non-zero, with the preferred size at least as large as the minimum,
+and the maximum at least as large as the preferred size, and the
+export size at least as large as the maximum.
+
+The minimum block size MUST be a power of 2, and represents the
+smallest addressable length and alignment within the export, even if
+writing a block that small requires the server to use a less-efficient
+read-modify-write action.  This value MAY be as small as 1 if the
+export is backed by a regular file, although the values of 512 or 4096
+are more typical for an export backed by a block device.  When block
+sizes are negotiated, the server MUST return an export size that is an
+integer multiple of the minimum block size.
+
+The preferred block size MUST be an integer multiple of the minimum
+block size, SHOULD be either a power of 2 or equal to the export size,
+and MUST NOT be larger than 33554432 (32M) nor the export size.  It
+represents the minimal length and alignment required for optimal I/O
+access; a typical size might be 65536.  The export size MAY be other
+than an integer multiple of the preferred size.
+
+The maximum block size represents the largest length that the server
+is willing to handle in a single transaction.  It MUST be an integer
+multiple of the minimum block size, MUST be at least the smaller of
+33554432 (32M) or the export size, and MUST NOT be larger than the
+export size.
+
 #### Fixed newstyle negotiation

 Unfortunately, due to a mistake, the server would immediately close the
@@ -600,8 +646,15 @@ the first magic number.
 - bit 0, `NBD_FLAG_FIXED_NEWSTYLE`; MUST be set by servers that
   support the fixed newstyle protocol
 - bit 1, `NBD_FLAG_NO_ZEROES`; if set, and if the client replies with
-  `NBD_FLAG_C_NO_ZEROES` in the client flags field, the server MUST NOT
-  send the 124 bytes of zero at the end of the negotiation.
+  `NBD_FLAG_C_NO_ZEROES` in the client flags field, the server MUST
+  NOT send trailing bytes of zero in response to `NBD_OPT_EXPORT_NAME`
+  at the end of the negotiation (omitting either 112 or 124 bytes,
+  depending on `NBD_FLAG_C_BLOCK_SIZE`).
+- bit 2, `NBD_FLAG_BLOCK_SIZE`; if set, and if the client replies with
+  `NBD_FLAG_C_BLOCK_SIZE` in the client flags field, the server MUST
+  advertize non-zero block sizes in response to `NBD_OPT_EXPORT_NAME`,
+  `NBD_OPT_INFO`, and `NBD_OPT_GO`, and clients MUST obey these
+  constraints during the transmission phase.

 The server MUST NOT set any other flags, and SHOULD NOT change behaviour
 unless the client responds with a corresponding flag.  The server MUST
@@ -652,8 +705,12 @@ receiving the handshake flags from the server.
   fixed newstyle from clients that didn't set this bit, but relying on
   this isn't recommended.
 - bit 1, `NBD_FLAG_C_NO_ZEROES`; MUST NOT be set if the server did not
-  set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 124
-  bytes of zeroes at the end of the negotiation.
+  set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 112
+  or 124 bytes of zeroes after `NBD_OPT_EXPORT_NAME`.
+- bit 2, `NBD_FLAG_C_BLOCK_SIZE`; MUST NOT be set if the server did not
+  set `NBD_FLAG_BLOCK_SIZE`. If set, the server MUST send block size
+  information as part of a successful reply to `NBD_OPT_EXPORT_NAME`,
+  `NBD_OPT_INFO`, and `NBD_OPT_GO`.

 Clients MUST NOT set any other flags; the server MUST drop the
 connection if the client sets an unknown flag, or a flag that does
@@ -848,6 +905,14 @@ The following request types exist:
     of data, read from *offset* bytes into the file, unless an error
     condition has occurred.

+    If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`, it MUST ensure
+    that *offset* and *length* are integer multiples of the minimum
+    block size, and SHOULD ensure that they are integer multiples of
+    the preferred block size where possible.  The client SHOULD NOT
+    request a length larger than the maximum block size.  The server
+    SHOULD report an `EINVAL` error if the client does not honor block
+    sizes.
+
     If an error occurs, the server SHOULD set the appropriate error
     code in the error field. The server MUST then either close the
     connection, or send *length* bytes of data (these bytes MAY be
@@ -871,6 +936,14 @@ The following request types exist:
     data to be written. The client MUST follow the request header with
     *length* number of bytes to be written to the device.

+    If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`, it MUST ensure
+    that *offset* and *length* are integer multiples of the minimum
+    block size, and SHOULD ensure that they are integer multiples of
+    the preferred block size where possible.  The client SHOULD NOT
+    request a length larger than the maximum block size.  The server
+    SHOULD report an `EINVAL` error if the client does not honor block
+    sizes.
+
     The server MUST write the data to disk, and then send the reply
     message. The server MAY send the reply message before the data has
     reached permanent storage.
@@ -906,6 +979,14 @@ The following request types exist:
     longer needed. A server MAY discard len bytes starting at offset, but
     is not required to.

+    If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`, it MUST ensure
+    that *offset* and *length* are integer multiples of the minimum
+    block size, and SHOULD ensure that they are integer multiples of
+    the preferred block size where possible.  The client SHOULD NOT
+    request a length larger than the maximum block size.  The server
+    SHOULD report an `EINVAL` error if the client does not honor block
+    sizes.
+
     After issuing this command, a client MUST NOT make any assumptions
     about the contents of the export affected by this command, until
     overwriting it again with `NBD_CMD_WRITE` or `NBD_CMD_WRITE_ZEROES`.
@@ -952,11 +1033,13 @@ The following error values are defined:
   experimental `STRUCTURED_REPLY` extension; see below.

 The server SHOULD return `ENOSPC` if it receives a write request
-including one or more sectors beyond the size of the device.  It SHOULD
-return `EINVAL` if it receives a read or trim request including one or
-more sectors beyond the size of the device.  It also SHOULD map the
-`EDQUOT` and `EFBIG` errors to `ENOSPC`.  Finally, it SHOULD return
-`EPERM` if it receives a write or trim request on a read-only export.
+including one or more sectors beyond the size of the device.  It
+SHOULD return `EINVAL` if it receives a read or trim request including
+one or more sectors beyond the size of the device, or if a read or
+write request does not comply with advertised minimum and maximum
+block sizes.  It also SHOULD map the `EDQUOT` and `EFBIG` errors to
+`ENOSPC`.  Finally, it SHOULD return `EPERM` if it receives a write or
+trim request on a read-only export.

 The server SHOULD return `EINVAL` if it receives an unknown command.

@@ -1042,7 +1125,9 @@ Therefore these commands share common documentation.
     interpretation than the default (so as to provide additional
     binary information normally sent in reply to `NBD_OPT_EXPORT_NAME`,
     in place of the default free-form string). The option reply length
-    MUST be *length of name* + 14, and the option data has the following layout:
+    MUST be *length of name* + 14 if `NBD_FLAG_C_BLOCK_SIZE` was not
+    negotiated, or *length of name* + 26 if it was.  The option data
+    has the following layout:

     - 32 bits, length of name (unsigned)  
     - String: name of the export. This name MAY be different from the one
@@ -1050,7 +1135,10 @@ Therefore these commands share common documentation.
       server has multiple alternate names for a single export, or a
       default export was specified.  
     - 64 bits, size of the export in bytes (unsigned)  
-    - 16 bits, transmission flags.  
+    - 16 bits, transmission flags  
+    - 32 bits, minimum block size (only if `NBD_FLAG_C_BLOCK_SIZE`)  
+    - 32 bits, preferred block size (only if `NBD_FLAG_C_BLOCK_SIZE`)  
+    - 32 bits, maximum block size (only if `NBD_FLAG_C_BLOCK_SIZE`)  

     If there are no intervening option requests between a successful
     `NBD_OPT_INFO` (that is, one that replied with `NBD_REP_SERVER`)
@@ -1095,6 +1183,14 @@ command flag.
     A write request with no payload. *Offset* and *length* define the location
     and amount of data to be zeroed.

+    If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`, it MUST ensure
+    that *offset* and *length* are integer multiples of the minimum
+    block size, and SHOULD ensure that they are integer multiples of
+    the preferred block size where possible.  The client SHOULD NOT
+    request a length larger than the maximum block size.  The server
+    SHOULD report an `EINVAL` error if the client does not honor block
+    sizes.
+
     The server MUST zero out the data on disk, and then send the reply
     message. The server MAY send the reply message before the data has
     reached permanent storage.
@@ -1347,8 +1443,9 @@ error, and alters the reply to the `NBD_CMD_READ` request.
     The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a
     client has requested `NBD_CMD_FLAG_DF` for a length that is too
     large to read without fragmentation.  The server MUST NOT return
-    this error if the read request did not exceed 65,536 bytes, and
-    SHOULD NOT return this error if `NBD_CMD_FLAG_DF` is not set.
+    this error if the read request did not exceed the smaller of 65,536
+    bytes or of the negotiated preferred block size, and SHOULD NOT
+    return this error if `NBD_CMD_FLAG_DF` is not set.

 * `NBD_CMD_READ`

@@ -1371,12 +1468,18 @@ error, and alters the reply to the `NBD_CMD_READ` request.
     overhead, the server SHOULD use chunks with lengths and offsets as
     an integer multiple of 512 bytes, where possible (the first and
     last chunk of an unaligned read being the most obvious places for
-    an exception).  The server MUST NOT send content chunks that
-    overlap with any earlier content or error chunk, and MUST NOT send
-    chunks that describe data outside the offset and length of the
-    request, but MAY send the content chunks in any order (the client
-    MUST reassemble content chunks into the correct order), and MAY
-    send additional content chunks even after reporting an error chunk.
+    an exception).  If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`,
+    the server MUST ensure that every chunk has a length and offset
+    are an integer multiple of the advertised minimum block size, and
+    SHOULD use chunks with multiples of the preferred block size where
+    possible.
+
+    The server MUST NOT send content chunks that overlap with any
+    earlier content or error chunk, and MUST NOT send chunks that
+    describe data outside the offset and length of the request, but
+    MAY send the content chunks in any order (the client MUST
+    reassemble content chunks into the correct order), and MAY send
+    additional content chunks even after reporting an error chunk.
     Note that a request for more than 2^32 - 8 bytes MUST be split
     into at least two chunks, so as not to overflow the length field
     of a reply while still allowing space for the offset of each
-- 
2.5.5




Reply to: