[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH v4] doc: Add alternate trim/zero limits



Hmm, Eric, what are the plans about this?

Now in upstream discard, write-zeros and block-status are limited in 
client to 32m by default which is too slow.
Can we make some minimal specification change to drop these strict 
limitations?

02.05.2018 00:22, Eric Blake wrote:
> A recent patch (6cbc0d5a) mentioned that a server that honors
> larger TRIM/WRITE_ZEROES requests than what it accepts for WRITE
> has to choose whether to advertise the maximum block size as the
> smaller limit at which it does hard disconnect for WRITE, or the
> larger limit at which it returns EINVAL for too-large trim/zero.
> Let's make the situation less ambiguous by allowing a client and
> server to negotiate explicit (larger) alternate limits for these
> two commands.  We are using the fact that NBD_OPT_GO already
> documents that the client may request additional NBD_INFO items,
> the server may reply with unrequested additional NBD_INFO items,
> and that both sides must ignore NBD_INFO items that they don't
> recognize.
>
> Note that since there are existing clients that do not expect
> the new info items, the server MUST NOT fail a request that is
> unaligned to the (larger) minimum trim/zero size with EINVAL;
> only something unaligned to the main minimum block size can
> result in EINVAL.  Rather, the new minimum sizes are for the
> client's information about what is efficient.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
>
> As a proof-of-concept implementation, I've prepared a patch set for
> qemu to quickly implement this; I tested that all client-server
> combinations (old-old, old-new, new-old, new-new) were able to
> negotiate, and that only in the new-new combination, qemu was
> able to take advantage of larger write zero requests.
>
> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg00135.html
>
> v3 was here:
> https://lists.debian.org/nbd/2018/03/msg00048.html
> In v4: incorporate a few more wording tweaks based on Vladimir's review
> ---
>   doc/proto.md | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 71 insertions(+), 6 deletions(-)
>
> diff --git a/doc/proto.md b/doc/proto.md
> index 6cc0fe2..32a36ba 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -799,6 +799,12 @@ size, although in that case, the client MUST treat the export size as
>   the effective maximum block size (as further constrained by a nonzero
>   offset).
>
> +The server MAY support individual commands that support a larger
> +length than the stated maximum block length (such as `NBD_CMD_TRIM`
> +and `NBD_CMD_WRITE_ZEROES`); in such cases, the server SHOULD
> +advertise the alternative limits as part of the information exchanged
> +during `NBD_OPT_INFO` and `NBD_OPT_GO`.
> +
>   Where a transmission request can have a nonzero *offset* and/or
>   *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
>   the client MUST ensure that *offset* and *length* are integer
> @@ -818,8 +824,8 @@ be considered a denial of service attack (even if the advertised
>   maximum block size is smaller).  For all other commands, where the
>   *length* is not reflected in the payload (such as `NBD_CMD_TRIM` or
>   `NBD_CMD_WRITE_ZEROES`), a server SHOULD merely fail the command with
> -an `EINVAL` error for a client that exceeds the maximum block size,
> -rather than initiating a hard disconnect.
> +an `EINVAL` error for a client that exceeds the maximum size for that
> +command, rather than initiating a hard disconnect.
>
>   ## Metadata querying
>
> @@ -1518,6 +1524,53 @@ during option haggling in the fixed newstyle negotiation.
>         - 32 bits, preferred block size
>         - 32 bits, maximum block size
>
> +    * `NBD_INFO_TRIM_SIZE` (4)
> +
> +      Represents alternative limits that the server will honour during
> +      `NBD_CMD_TRIM`.  The server MUST NOT send this info unless it
> +      also sends `NBD_INFO_BLOCK_SIZE`, and SHOULD NOT send this info
> +      unless it will also be advertising the transmission flag
> +      `NBD_CMD_SEND_TRIM`.  The minimum trim size SHOULD be a power of
> +      2, and MUST be at least as large as the preferred block size
> +      advertised in `NBD_INFO_BLOCK_SIZE`; it represents the alignment
> +      and minimum granularity that can be discarded.  When a client
> +      trim request is aligned to the minimum block size but not the
> +      minimum trim size, the server MUST NOT fail that request with
> +      `EINVAL`, but MAY round the request to the aligned subsest, if
> +      any.  The maximum trim size MUST be either 0xffffffff (for no
> +      inherent 32-bit limit) or a multiple of the minimum trim size,
> +      and MUST be at least as large as the maximum block size
> +      advertised in `NBD_INFO_BLOCK_SIZE`.  The *length* MUST be 10,
> +      and the reply payload is interpreted as:
> +
> +      - 16 bits, `NBD_INFO_TRIM_SIZE`
> +      - 32 bits, minimum trim size
> +      - 32 bits, maximum trim size
> +
> +    * `NBD_INFO_ZERO_SIZE` (5)
> +
> +      Represents alternative limits that the server will honour during
> +      `NBD_CMD_WRITE_ZEROES`.  The server MUST NOT send this info
> +      unless it also sends `NBD_INFO_BLOCK_SIZE`, and SHOULD NOT send
> +      this info unless it will also be advertising the transmission
> +      flag `NBD_CMD_SEND_WRITE_ZEROES`.  The minimum zero size SHOULD
> +      be a power of 2, and MUST be at least as large as the preferred
> +      block size advertised in `NBD_INFO_BLOCK_SIZE`; it represents
> +      the alignment and minimum granularity that can be efficiently
> +      written as zeroes.  When a client zero request is aligned to the
> +      minimum block size but not the minimum zero size, the server
> +      MUST NOT fail that request with `EINVAL`, but MUST write zeroes
> +      (even if by less efficient means).  The maximum zero size MUST
> +      be either 0xffffffff (for no inherent 32-bit limit) or a
> +      multiple of the minimum zero size, and MUST be at least as large
> +      as the maximum block size advertised in `NBD_INFO_BLOCK_SIZE`.
> +      The *length* MUST be 10, and the reply payload is interpreted
> +      as:
> +
> +      - 16 bits, `NBD_INFO_ZERO_SIZE`
> +      - 32 bits, minimum zero size
> +      - 32 bits, maximum zero size
> +
>   * `NBD_REP_META_CONTEXT` (4)
>
>       A description of a metadata context. Data:
> @@ -1794,6 +1847,9 @@ The following request types exist:
>       negotiated using `NBD_OPT_STRUCTURED_REPLY`. The client SHOULD NOT
>       request a read length of 0; the behavior of a server on such a
>       request is unspecified although the server SHOULD NOT disconnect.
> +    The optional `NBD_INFO_BLOCK_SIZE` information sent during
> +    `NBD_OPT_GO` can provide more details about any read alignment
> +    constraints.
>
>       *Simple replies*
>
> @@ -1893,7 +1949,9 @@ The following request types exist:
>       *length* number of bytes to be written to the device. The client
>       SHOULD NOT request a write length of 0; the behavior of a server on
>       such a request is unspecified although the server SHOULD NOT
> -    disconnect.
> +    disconnect.  The optional `NBD_INFO_BLOCK_SIZE` information sent
> +    during `NBD_OPT_GO` can provide more details about any write
> +    alignment constraints.
>
>       The server MUST write the data to disk, and then send the reply
>       message. The server MAY send the reply message before the data has
> @@ -1936,7 +1994,9 @@ The following request types exist:
>       a portion of the client's request is actually discarded. The
>       client SHOULD NOT request a trim length of 0; the behavior of a
>       server on such a request is unspecified although the server SHOULD
> -    NOT disconnect.
> +    NOT disconnect.  The optional `NBD_INFO_BLOCK_SIZE` and
> +    `NBD_INFO_TRIM_SIZE` information sent during `NBD_OPT_GO` can
> +    provide more details about any trim alignment constraints.
>
>       After issuing this command, a client MUST NOT make any assumptions
>       about the contents of the export affected by this command, until
> @@ -1965,7 +2025,10 @@ The following request types exist:
>       command flag `NBD_CMD_FLAG_NO_HOLE` to inform the server that the
>       area MUST be fully provisioned, ensuring that future writes to the
>       same area will not cause fragmentation or cause failure due to
> -    insufficient space.
> +    insufficient space.  The optional `NBD_INFO_BLOCK_SIZE` and
> +    `NBD_INFO_ZERO_SIZE` information sent during `NBD_OPT_GO` can
> +    provide more details for using alignments that can optimize the
> +    speed at which a zero request can be handled.
>
>       If an error occurs, the server MUST set the appropriate error code
>       in the error field.
> @@ -1979,7 +2042,9 @@ The following request types exist:
>       A block status query request. Length and offset define the range
>       of interest.  The client SHOULD NOT request a status length of 0;
>       the behavior of a server on such a request is unspecified although
> -    the server SHOULD NOT disconnect.
> +    the server SHOULD NOT disconnect.  The optional
> +    `NBD_INFO_BLOCK_SIZE` information sent during `NBD_OPT_GO` can
> +    provide more details about any status alignment constraints.
>
>       A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
>       negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least


-- 
Best regards,
Vladimir


Reply to: