[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[PATCH v3 4/5] doc: Add alternate trim/zero limits



The previous patch mentioned that a server that honors larger
TRIM/WRITE_ZEROES requests than accepted for WRITE has to choose
whether to advertise the maximum block size as the smaller limit
at which it does hard disconnect for WRITE, or the larger limit
at which it returns EINVAL for too-large trim/zero.  Let's make
the situation less ambiguous by allowing a client and server to
negotiate explicit (larger) alternate limits for these two
commands, using the fact that NBD_OPT_GO already documents that
the client may request additional NBD_INFO items, the server may
reply with unrequested additional NBD_INFO items, and that both
sides must ignore NBD_INFO items that they don't recognize.

Note that since there are existing clients that do not expect
the new info items, the server MUST NOT fail a request that is
unaligned to the (larger) minimum trim/zero size with EINVAL;
only something unaligned to the main minimum block size can
result in EINVAL.  Rather, the new minimum sizes are for the
client's information about what is efficient.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: a few more wording tweaks
v2: wording tweaks, don't allow EINVAL failure on requests that
meet primary minimum alignment but not alternate alignment, more
text added for the sake of cross-references
---
 doc/proto.md | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 67 insertions(+), 4 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 4b6f638..b5f21e3 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -797,6 +797,12 @@ maximum block size that is larger than the export size, although in
 that case, the client MUST treat the export size as the effective
 maximum block size (as further constrained by a nonzero offset).

+The server MAY support individual commands that support a larger
+length than the stated maximum block length (such as `NBD_CMD_TRIM`
+and `NBD_CMD_WRITE_ZEROES`); in such cases, the server SHOULD
+advertise the alternative limits as part of the information exchanged
+during `NBD_OPT_INFO` and `NBD_OPT_GO`.
+
 Where a transmission request can have a nonzero *offset* and/or
 *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
 the client MUST ensure that *offset* and *length* are integer
@@ -816,8 +822,8 @@ be considered a denial of service attack (even if the advertised
 maximum block size is smaller).  For all other commands, where the
 *length* is not reflected in the payload (such as `NBD_CMD_TRIM` or
 `NBD_CMD_WRITE_ZEROES`), a server SHOULD merely fail the command with
-an `EINVAL` error for a client that exceeds the maximum block size,
-rather than initiating a hard disconnect.
+an `EINVAL` error for a client that exceeds the maximum size for that
+command, rather than initiating a hard disconnect.

 ## Values

@@ -1252,6 +1258,58 @@ during option haggling in the fixed newstyle negotiation.
       - 32 bits, preferred block size  
       - 32 bits, maximum block size  

+    * `NBD_INFO_TRIM_SIZE` (4)
+
+      Represents alternative limits that the server will honour during
+      `NBD_CMD_TRIM`.  The server MUST NOT send this info unless it
+      also sends `NBD_INFO_BLOCK_SIZE`, and SHOULD NOT send this info
+      unless it will also be advertising the transmission flag
+      `NBD_CMD_SEND_TRIM`.  The minimum trim size SHOULD be a power of
+      2, and MUST be at least as large as the preferred block size
+      advertised in `NBD_INFO_BLOCK_SIZE`; it represents the alignment
+      and minimum granularity that can be discarded.  When a client
+      trim request is aligned to the minimum block size but not the
+      minimum trim size, the server MUST NOT fail that request with
+      `EINVAL`, but MAY round the request to the aligned subsest, if
+      any.  The maximum trim size MUST be either 0xffffffff (for no
+      inherent 32-bit limit) or a multiple of the minimum trim size,
+      and MUST be at least as large as the maximum block size
+      advertised in `NBD_INFO_BLOCK_SIZE`.  If the server does not
+      advertise alternative trim limits, then the client SHOULD limit
+      `NBD_CMD_TRIM` alignment and sizes to the same limits as any
+      other command.  The *length* MUST be 10, and the reply payload
+      is interpreted as:
+
+      - 16 bits, `NBD_INFO_TRIM_SIZE`
+      - 32 bits, minimum trim size
+      - 32 bits, maximum trim size
+
+    * `NBD_INFO_ZERO_SIZE` (5)
+
+      Represents alternative limits that the server will honour during
+      `NBD_CMD_WRITE_ZEROES`.  The server MUST NOT send this info
+      unless it also sends `NBD_INFO_BLOCK_SIZE`, and SHOULD NOT send
+      this info unless it will also be advertising the transmission
+      flag `NBD_CMD_SEND_WRITE_ZEROES`.  The minimum zero size SHOULD
+      be a power of 2, and MUST be at least as large as the preferred
+      block size advertised in `NBD_INFO_BLOCK_SIZE`; it represents
+      the alignment and minimum granularity that can be efficiently
+      written as zeroes.  When a client zero request is aligned to the
+      minimum block size but not the minimum zero size, the server
+      MUST NOT fail that request with `EINVAL`, but MUST write zeroes
+      (even if by less efficient means).  The maximum zero size MUST
+      be either 0xffffffff (for no inherent 32-bit limit) or a
+      multiple of the minimum zero size, and MUST be at least as large
+      as the maximum block size advertised in `NBD_INFO_BLOCK_SIZE`.
+      If the server does not advertise alternative zero limits, then
+      the client SHOULD limit `NBD_CMD_WRITE_ZEROES` alignment and
+      sizes to the same limits as any other command.  The *length*
+      MUST be 10, and the reply payload is interpreted as:
+
+      - 16 bits, `NBD_INFO_ZERO_SIZE`
+      - 32 bits, minimum zero size
+      - 32 bits, maximum zero size
+
 * `NBD_REP_META_CONTEXT` (4)

     Defined by the experimental `BLOCK_STATUS` [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md).
@@ -1634,7 +1692,9 @@ The following request types exist:
     a portion of the client's request is actually discarded. The
     client SHOULD NOT request a trim length of 0; the behavior of a
     server on such a request is unspecified although the server SHOULD
-    NOT disconnect.
+    NOT disconnect.  The optional `NBD_INFO_TRIM_SIZE` information
+    sent during `NBD_OPT_GO` can provide more details about any trim
+    alignment constraints.

     After issuing this command, a client MUST NOT make any assumptions
     about the contents of the export affected by this command, until
@@ -1663,7 +1723,10 @@ The following request types exist:
     command flag `NBD_CMD_FLAG_NO_HOLE` to inform the server that the
     area MUST be fully provisioned, ensuring that future writes to the
     same area will not cause fragmentation or cause failure due to
-    insufficient space.
+    insufficient space.  The optional `NBD_INFO_ZERO_SIZE` information
+    sent during `NBD_OPT_GO` can provide more details for using
+    alignments that can optimize the speed at which a zero request can
+    be handled.

     If an error occurs, the server MUST set the appropriate error code
     in the error field.
-- 
2.14.3


Reply to: