[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH v2 3/3] doc: Add alternate trim/zero limits



15.03.2018 00:16, Eric Blake wrote:
The previous patch mentioned that a server that honors larger
TRIM/WRITE_ZEROES requests than accepted for WRITE has to choose
whether to advertise the maximum block size as the smaller limit
at which it does hard disconnect for WRITE, or the larger limit
at which it returns EINVAL for too-large trim/zero.  Let's make
the situation less ambiguous by allowing a client and server to
negotiate explicit (larger) alternate limits for these two
commands, using the fact that NBD_OPT_GO already documents that
the client may request additional NBD_INFO items, the server may
reply with unrequested additional NBD_INFO items, and that both
sides must ignore NBD_INFO items that they don't recognize.

Note that since there are existing clients that do not expect
the new info items, the server MUST NOT fail a request that is
unaligned to the (larger) minimum trim/zero size with EINVAL;
only something unaligned to the main minimum block size can
result in EINVAL.  Rather, the new minimum sizes are for the
client's information about what is efficient.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: wording tweaks, don't allow EINVAL failure on requests that
meet primary alignment but not alternate alignment, more text
added for the sake of cross-references
[1]

---
  doc/proto.md | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
  1 file changed, 67 insertions(+), 4 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 5dd3b53..bb5d513 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -799,6 +799,12 @@ size, although in that case, the client MUST treat the export size as
  the effective maximum block size (as further constrained by a nonzero
  offset).

+The server MAY support individual commands that support a larger
+length than the stated maximum block length (such as `NBD_CMD_TRIM`
+and `NBD_CMD_WRITE_ZEROES`); in such cases, the server SHOULD
+advertise the alternative limits as part of the information exchanged
+during `NBD_OPT_GO`.

and NBD_OPT_INFO ?

+
  Where a transmission request can have a nonzero *offset* and/or
  *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
  the client MUST ensure that *offset* and *length* are integer
@@ -818,8 +824,8 @@ be considered a denial of service attack (even if the advertised
  maximum block size is smaller).  For all other commands, where the
  *length* is not reflected in the payload (such as `NBD_CMD_TRIM` or
  `NBD_CMD_WRITE_ZEROES`), a server SHOULD merely fail the command with
-an `EINVAL` error for a client that exceeds the maximum block size,
-rather than initiating a hard disconnect.
+an `EINVAL` error for a client that exceeds the maximum size for that
+command, rather than initiating a hard disconnect.

looks like don't correspond to [1]


  ## Values

@@ -1254,6 +1260,58 @@ during option haggling in the fixed newstyle negotiation.
        - 32 bits, preferred block size
        - 32 bits, maximum block size

+    * `NBD_INFO_TRIM_SIZE` (4)
+
+      Represents alternative limits that the server will honour during
+      `NBD_CMD_TRIM`.  The server MUST NOT send this info unless it
+      also sends `NBD_INFO_BLOCK_SIZE`, and SHOULD NOT send this info
+      unless it will also be advertising the transmission flag
+      `NBD_CMD_SEND_TRIM`.  The minimum trim size SHOULD be a power of
+      2, and MUST be at least as large as the preferred block size
+      advertised in `NBD_INFO_BLOCK_SIZE`; it represents the alignment
+      and minimum granularity that can be discarded.  When a client
+      request

TRIM request

is aligned to the minimum block size but not the minimum
+      trim size, the server MUST NOT fail that request with `EINVAL`,
+      but SHOULD ignore the unaligned portion of the request (and trim
+      only the aligned subset, if any).
don't it violate severs, which you mention in first patch commit message,
"(for specific implementations where the client knows via external means that
the server will guarantee a reads-as-zero after the trim
completes)" ?


   The maximum trim size MUST be
+      either 0xffffffff (for no inherent 32-bit limit) or a multiple
+      of the minimum trim size, and MUST be at least as large as the
+      maximum block size advertised in `NBD_INFO_BLOCK_SIZE`.  If the
+      server does not advertise alternative trim limits, then the
+      client SHOULD limit `NBD_CMD_TRIM` alignment and sizes to the
+      same limits as any other command.  The *length* MUST be 10, and
+      the reply payload is interpreted as:
+
+      - 16 bits, `NBD_INFO_TRIM_SIZE`
+      - 32 bits, minimum trim size
+      - 32 bits, maximum trim size
+
+    * `NBD_INFO_ZERO_SIZE` (5)
+
+      Represents alternative limits that the server will honour during
+      `NBD_CMD_WRITE_ZEROES`.  The server MUST NOT send this info
+      unless it also sends `NBD_INFO_BLOCK_SIZE`, and SHOULD NOT send
+      this info unless it will also be advertising the transmission
+      flag `NBD_CMD_SEND_WRITE_ZEROES`.  The minimum zero size SHOULD
+      be a power of 2, and MUST be at least as large as the preferred
+      block size advertised in `NBD_INFO_BLOCK_SIZE`; it represents
+      the alignment and minimum granularity that can be efficiently
+      written as zeroes.  When a client request

WRITE_ZEROES request

  is aligned to the
+      minimum block size but not the minimum zero size, the server
+      MUST NOT fail that request with `EINVAL`, but MUST write zeroes
+      (even if by less efficient means).  The maximum zero size MUST
+      be either 0xffffffff (for no inherent 32-bit limit) or a
+      multiple of the minimum zero size, and MUST be at least as large
+      as the maximum block size advertised in `NBD_INFO_BLOCK_SIZE`.
+      If the server does not advertise alternative zero limits, then
+      the client SHOULD limit `NBD_CMD_WRITE_ZEROES` alignment and
+      sizes to the same limits as any other command.  The *length*
+      MUST be 10, and the reply payload is interpreted as:
+
+      - 16 bits, `NBD_INFO_ZERO_SIZE`
+      - 32 bits, minimum zero size
+      - 32 bits, maximum zero size
+
  * `NBD_REP_META_CONTEXT` (4)

[...]

--
Best regards,
Vladimir


Reply to: