Re: [PATCH v4] doc: Add alternate trim/zero limits
Hmm, Eric, what are the plans about this?
Now in upstream discard, write-zeros and block-status are limited in
client to 32m by default which is too slow.
Can we make some minimal specification change to drop these strict
limitations?
02.05.2018 00:22, Eric Blake wrote:
> A recent patch (6cbc0d5a) mentioned that a server that honors
> larger TRIM/WRITE_ZEROES requests than what it accepts for WRITE
> has to choose whether to advertise the maximum block size as the
> smaller limit at which it does hard disconnect for WRITE, or the
> larger limit at which it returns EINVAL for too-large trim/zero.
> Let's make the situation less ambiguous by allowing a client and
> server to negotiate explicit (larger) alternate limits for these
> two commands. We are using the fact that NBD_OPT_GO already
> documents that the client may request additional NBD_INFO items,
> the server may reply with unrequested additional NBD_INFO items,
> and that both sides must ignore NBD_INFO items that they don't
> recognize.
>
> Note that since there are existing clients that do not expect
> the new info items, the server MUST NOT fail a request that is
> unaligned to the (larger) minimum trim/zero size with EINVAL;
> only something unaligned to the main minimum block size can
> result in EINVAL. Rather, the new minimum sizes are for the
> client's information about what is efficient.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
>
> As a proof-of-concept implementation, I've prepared a patch set for
> qemu to quickly implement this; I tested that all client-server
> combinations (old-old, old-new, new-old, new-new) were able to
> negotiate, and that only in the new-new combination, qemu was
> able to take advantage of larger write zero requests.
>
> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg00135.html
>
> v3 was here:
> https://lists.debian.org/nbd/2018/03/msg00048.html
> In v4: incorporate a few more wording tweaks based on Vladimir's review
> ---
> doc/proto.md | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 71 insertions(+), 6 deletions(-)
>
> diff --git a/doc/proto.md b/doc/proto.md
> index 6cc0fe2..32a36ba 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -799,6 +799,12 @@ size, although in that case, the client MUST treat the export size as
> the effective maximum block size (as further constrained by a nonzero
> offset).
>
> +The server MAY support individual commands that support a larger
> +length than the stated maximum block length (such as `NBD_CMD_TRIM`
> +and `NBD_CMD_WRITE_ZEROES`); in such cases, the server SHOULD
> +advertise the alternative limits as part of the information exchanged
> +during `NBD_OPT_INFO` and `NBD_OPT_GO`.
> +
> Where a transmission request can have a nonzero *offset* and/or
> *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
> the client MUST ensure that *offset* and *length* are integer
> @@ -818,8 +824,8 @@ be considered a denial of service attack (even if the advertised
> maximum block size is smaller). For all other commands, where the
> *length* is not reflected in the payload (such as `NBD_CMD_TRIM` or
> `NBD_CMD_WRITE_ZEROES`), a server SHOULD merely fail the command with
> -an `EINVAL` error for a client that exceeds the maximum block size,
> -rather than initiating a hard disconnect.
> +an `EINVAL` error for a client that exceeds the maximum size for that
> +command, rather than initiating a hard disconnect.
>
> ## Metadata querying
>
> @@ -1518,6 +1524,53 @@ during option haggling in the fixed newstyle negotiation.
> - 32 bits, preferred block size
> - 32 bits, maximum block size
>
> + * `NBD_INFO_TRIM_SIZE` (4)
> +
> + Represents alternative limits that the server will honour during
> + `NBD_CMD_TRIM`. The server MUST NOT send this info unless it
> + also sends `NBD_INFO_BLOCK_SIZE`, and SHOULD NOT send this info
> + unless it will also be advertising the transmission flag
> + `NBD_CMD_SEND_TRIM`. The minimum trim size SHOULD be a power of
> + 2, and MUST be at least as large as the preferred block size
> + advertised in `NBD_INFO_BLOCK_SIZE`; it represents the alignment
> + and minimum granularity that can be discarded. When a client
> + trim request is aligned to the minimum block size but not the
> + minimum trim size, the server MUST NOT fail that request with
> + `EINVAL`, but MAY round the request to the aligned subsest, if
> + any. The maximum trim size MUST be either 0xffffffff (for no
> + inherent 32-bit limit) or a multiple of the minimum trim size,
> + and MUST be at least as large as the maximum block size
> + advertised in `NBD_INFO_BLOCK_SIZE`. The *length* MUST be 10,
> + and the reply payload is interpreted as:
> +
> + - 16 bits, `NBD_INFO_TRIM_SIZE`
> + - 32 bits, minimum trim size
> + - 32 bits, maximum trim size
> +
> + * `NBD_INFO_ZERO_SIZE` (5)
> +
> + Represents alternative limits that the server will honour during
> + `NBD_CMD_WRITE_ZEROES`. The server MUST NOT send this info
> + unless it also sends `NBD_INFO_BLOCK_SIZE`, and SHOULD NOT send
> + this info unless it will also be advertising the transmission
> + flag `NBD_CMD_SEND_WRITE_ZEROES`. The minimum zero size SHOULD
> + be a power of 2, and MUST be at least as large as the preferred
> + block size advertised in `NBD_INFO_BLOCK_SIZE`; it represents
> + the alignment and minimum granularity that can be efficiently
> + written as zeroes. When a client zero request is aligned to the
> + minimum block size but not the minimum zero size, the server
> + MUST NOT fail that request with `EINVAL`, but MUST write zeroes
> + (even if by less efficient means). The maximum zero size MUST
> + be either 0xffffffff (for no inherent 32-bit limit) or a
> + multiple of the minimum zero size, and MUST be at least as large
> + as the maximum block size advertised in `NBD_INFO_BLOCK_SIZE`.
> + The *length* MUST be 10, and the reply payload is interpreted
> + as:
> +
> + - 16 bits, `NBD_INFO_ZERO_SIZE`
> + - 32 bits, minimum zero size
> + - 32 bits, maximum zero size
> +
> * `NBD_REP_META_CONTEXT` (4)
>
> A description of a metadata context. Data:
> @@ -1794,6 +1847,9 @@ The following request types exist:
> negotiated using `NBD_OPT_STRUCTURED_REPLY`. The client SHOULD NOT
> request a read length of 0; the behavior of a server on such a
> request is unspecified although the server SHOULD NOT disconnect.
> + The optional `NBD_INFO_BLOCK_SIZE` information sent during
> + `NBD_OPT_GO` can provide more details about any read alignment
> + constraints.
>
> *Simple replies*
>
> @@ -1893,7 +1949,9 @@ The following request types exist:
> *length* number of bytes to be written to the device. The client
> SHOULD NOT request a write length of 0; the behavior of a server on
> such a request is unspecified although the server SHOULD NOT
> - disconnect.
> + disconnect. The optional `NBD_INFO_BLOCK_SIZE` information sent
> + during `NBD_OPT_GO` can provide more details about any write
> + alignment constraints.
>
> The server MUST write the data to disk, and then send the reply
> message. The server MAY send the reply message before the data has
> @@ -1936,7 +1994,9 @@ The following request types exist:
> a portion of the client's request is actually discarded. The
> client SHOULD NOT request a trim length of 0; the behavior of a
> server on such a request is unspecified although the server SHOULD
> - NOT disconnect.
> + NOT disconnect. The optional `NBD_INFO_BLOCK_SIZE` and
> + `NBD_INFO_TRIM_SIZE` information sent during `NBD_OPT_GO` can
> + provide more details about any trim alignment constraints.
>
> After issuing this command, a client MUST NOT make any assumptions
> about the contents of the export affected by this command, until
> @@ -1965,7 +2025,10 @@ The following request types exist:
> command flag `NBD_CMD_FLAG_NO_HOLE` to inform the server that the
> area MUST be fully provisioned, ensuring that future writes to the
> same area will not cause fragmentation or cause failure due to
> - insufficient space.
> + insufficient space. The optional `NBD_INFO_BLOCK_SIZE` and
> + `NBD_INFO_ZERO_SIZE` information sent during `NBD_OPT_GO` can
> + provide more details for using alignments that can optimize the
> + speed at which a zero request can be handled.
>
> If an error occurs, the server MUST set the appropriate error code
> in the error field.
> @@ -1979,7 +2042,9 @@ The following request types exist:
> A block status query request. Length and offset define the range
> of interest. The client SHOULD NOT request a status length of 0;
> the behavior of a server on such a request is unspecified although
> - the server SHOULD NOT disconnect.
> + the server SHOULD NOT disconnect. The optional
> + `NBD_INFO_BLOCK_SIZE` information sent during `NBD_OPT_GO` can
> + provide more details about any status alignment constraints.
>
> A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless within the
> negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` at least
--
Best regards,
Vladimir
Reply to: