[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH 2/2] doc: Add alternate trim/zero limits



On 02/28/2018 05:20 PM, Eric Blake wrote:
The previous patch mentioned that a server that honors larger
TRIM/WRITE_ZEROES requests than accepted for WRITE has to choose
whether to advertise the maximum block size as the smaller limit
at which it does hard disconnect for WRITE, or the larger limit
at which it returns EINVAL for too-large trim/zero.  Let's make
the situation less ambiguous by allowing a client and server to
negotiate explicit alternate limits for these two commands,
using the fact that NBD_OPT_GO already requires both client and
server to request additional NBD_INFO items, and to ignore items
that they don't recognize.


+    * `NBD_INFO_ZERO_SIZE` (5)
+
+      Represents alternate limits that the server will honour during
+      `NBD_CMD_WRITE_ZEROES`.  The server SHOULD NOT send this info
+      unless it will also be advertising the transmission flag
+      `NBD_CMD_SEND_WRITE_ZEROES`.  The minimum zero size SHOULD be a
+      power of 2, and MUST be at least as large as the preferred block
+      size advertised in `NBD_INFO_BLOCK_SIZE`; it represents the
+      alignment and minimum granularity that can be efficiently
+      written as zeroes (a server that receives a zero request not
+      aligned to these boundaries MAY reject the request with an
+      error; or MAY perform the request using slower means such as
+      read-modify-write).

Ouch, this is not friendly to older clients that did not know how to interpret NBD_INFO_ZERO_SIZE. Such a client could send a request aligned to the normal NBD_INFO_BLOCK_SIZE alignments, expecting it to work, and get a failure instead. I think a better wording is that the alternate sizes cannot be advertised unless the primary sizes are also present, and that as long as a request satisfies the primary alignments, the server must perform the write zeroes, even if by less efficient means (reserving EINVAL for alignment errors only when it doesn't satisfy primary alignments). That is, the minimum zero size is not a hard limit, so much as a hint for where writing zeroes becomes more efficient due to punching holes instead of naively writing zeroes.

I'll fix that in v2.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


Reply to: