[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH v2 2/3] doc: Clarify maximum size limits



15.03.2018 00:16, Eric Blake wrote:
The existing text on block sizes had a redundant statement about
hard disconnects on large requests, but did not specify a size
where that comes into play.  In practice, several NBD
implementations have settled on 32M (qemu, kernel module) or 64M
(nbdkit) as the limit for maximum request size (at the client)
or hard disconnect size (at the server); thus we should document
32M as the portable limit that clients can expect and servers
should support as a way to increase interoperability.

There is also some confusion on whether WRITE_ZEROES must abide
by the same limits as WRITE.  In practice, servers permit larger
limits for TRIM and WRITE_ZEROES than for WRITE because there is
no payload involved; but a server may still want to enforce a
maximum size other than 4G because of other constraints (for
example, if a WRITE_ZEROES request is implemented by malloc'ing
a buffer and performing a naive writes, clamping the maximum
malloc size or limiting the amount of time spent on looping
to perform the writes may still be desirable for the server).
But in practice, existing servers reserve hard disconnect for
overlarge request payloads (NBD_CMD_WRITE) or what would require
an overlarge reply payload (NBD_CMD_READ), while any other
command will just receive an error rather than a hard disconnect.

With just one value for 'maximum block size', we have a slight
ambiguity where the server has to decide whether to advertise the
hard disconnect limit (32M) or the maximum size for other
commands, but hopefully the wording chosen here will make it
acceptable for a server that advertises 32M and then permits
clients to try larger WRITE_ZEROES for less traffic (where the
client knows to fall back to smaller requests if the large
request fail with EINVAL); as well as a server that advertises
a larger maximum size (as its actual limit for TRIM and
WRITE_ZEROES) while expecting the client to not exceed 32M
for the denial of service aspect.  The former interpretation
is slightly preferred, especially if a later patch adds a new
NBD_INFO_ field for documenting exact limits for TRIM and
WRITE_ZEROES during NBD_OPT_GO.

Finally, requiring a server to support 32M as its maximum block
size may be a bit large for some setups; when sizes are
advertised, it is reasonable to allow a server with a smaller
explicit limit of 1M.  Rephrase some constraints for easier
reading.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: more wording tweaks
---
  doc/proto.md | 79 +++++++++++++++++++++++++++++++++++-------------------------
  1 file changed, 46 insertions(+), 33 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index adf2651..5dd3b53 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -739,28 +739,31 @@ information request, it MUST abide by the block size constraints it
  receives. Clients MAY issue `NBD_OPT_INFO` with `NBD_INFO_BLOCK_SIZE` to
  learn the server's constraints without committing to them.

-If block size constraints have not been advertised or agreed on externally,
-then a client SHOULD assume a default minimum block size of 1, a preferred
-block size of 2^12 (4,096), and a maximum block size of the smaller of
-the export size or 0xffffffff (effectively unlimited).  A server that
-wants to enforce block sizes other than the defaults specified here
-MAY refuse to go into transmission phase with a client that uses
-`NBD_OPT_EXPORT_NAME` (via a hard disconnect) or which fails to use
-`NBD_INFO_BLOCK_SIZE` with `NBD_OPT_GO` (where the server uses
-`NBD_REP_ERR_BLOCK_SIZE_REQD`), although a server SHOULD permit such
-clients if block size constraints are the default or can be agreed on
-externally.  When allowing such clients, the server MUST cleanly error
-commands that fall outside block size constraints without corrupting
-data; even so, this may limit interoperability.
+If block size constraints have not been advertised or agreed on

"advertised" here means answer on OPT_INFO or OPT_GO

+externally, then a server SHOULD support a default minimum block size
[1]
+of 1, a preferred block size of 2^12 (4,096), and a maximum block size
+that is effectively unlimited (0xffffffff, or the export size if that
+is smaller), while a client desiring maximum interoperability SHOULD
+constrain its requests to a minimum block size of 2^9 (512), and limit
+`NBD_CMD_READ` and `NBD_CMD_WRITE` commands to a maximum block size of
+2^25 (33,554,432).

for client, defaults: 1 -> 512, unlimited -> 32m. for server - unchanged. And wording is very understandable.


A server that wants to enforce block sizes other
(it's a server, which don't want to follow [1]SHOULD)

+than the defaults specified here MAY refuse to go into transmission
+phase with a client that uses `NBD_OPT_EXPORT_NAME` (via a hard
+disconnect) or which
unchanged


  uses `NBD_OPT_GO` without requesting
+`NBD_INFO_BLOCK_SIZE` (via an error reply of
+`NBD_REP_ERR_BLOCK_SIZE_REQD`); but servers SHOULD NOT refuse clients
+that do not request sizing information when the server supports
actually, all servers SHOULD support these defaults [1]

+default sizing or where sizing constraints can be agreed on
+externally.  When allowing clients that did not negotiate sizing via
+NBD, a server that enforces stricter block size constraints than the
+defaults MUST cleanly error commands that fall outside the constraints
+without corrupting data;
At first, I wanted to s/corrupting/changing, as not every changing may be considered as corrupting. But, actually, any changing when we return an error may be considered as wrong and corrupting. So, it should
be OK.

(hm, also, actually we allow server to error any command, if constraints are not advertised)


final cases for not advertised constraints:

any server SHOULD support defaults
if server support default, but want not default: it MAY, but SHOULD NOT refuse
if server doesn't support default:
    - it MAY refuse
    - it SHOULD NOT refuse if constraints can be agreed on externally
    - it MUST error commands that fall outside the constraints

for generic client it all just means, that generic server MAY permit, or MAY refuse, and MAY error commands if permitted connection.
(external agreement is a separate case, not about generic server and client)

  even so, enforcing constraints in this manner
+may limit interoperability.
-A client MAY choose to operate as if tighter block size constraints had
-been specified (for example, even when the server advertises the default
-minimum block size of 1, a client may safely use a minimum block size
-of 2^9 (512), a preferred block size of 2^16 (65,536), and a maximum
-block size of 2^25 (33,554,432)).  Notwithstanding any maximum block
-size advertised, either the server or the client MAY initiate a hard
-disconnect if the size of a request or a reply is large enough to be
-deemed a denial of service attack.
+A client MAY choose to operate as if tighter block size constraints
+had been specified (for example, even when the server advertises the
+default minimum block size of 1, a client may safely use a minimum
+block size of 2^9 (512)).

here dropped inconsistent part of the example and sentence about DOS attack.

will add a bit more context for some questions..


  The minimum block size represents the smallest addressable length and
  alignment within the export, although writing to an area that small
  may require the server to use a less-efficient read-modify-write
  action.  If advertised, this value MUST be a power of 2, MUST NOT be
  larger than 2^16 (65,536), and MAY be as small as 1 for an export
  backed by a regular file, although the values of 2^9 (512) or 2^12
  (4,096) are more typical for an export backed by a block device.  If a
  server advertises a minimum block size, the advertised export size
  SHOULD be an integer multiple of that block size, since otherwise, the
  client would be unable to access the final few bytes of the export.

  The preferred block size represents the minimum size at which aligned
  requests will have efficient I/O, avoiding behaviour such as
  read-modify-write.  If advertised, this MUST be a power of 2 at least
  as large as the smaller of the minimum block size and 2^12 (4,096),

pref_block >= min(min_block, 4K).. So, pref block may be less than
min_block??? it looks like a bug.
(consider min_block = 64K,  pref_block = 4K: 4K >= min(64K, 4K))

  although larger values (such as the minimum granularity of a hole) are
  also appropriate.
don't understand.. "are also" after "MUST"?

   The preferred block size MAY be larger than the
  export size, in which case the client is unable to utilize the
  preferred block size for that export.
Hmm, we we need it? Why not SHOULD NOT?


   The server MAY advertise an
  export size that is not an integer multiple of the preferred block
  size.


  The maximum block size represents the maximum length that the server
-is willing to handle in one request.  If advertised, it MUST be either
-an integer multiple of the minimum block size or the value 0xffffffff
-for no inherent limit, MUST be at least as large as the smaller of the
-preferred block size or export size, and SHOULD be at least 2^25
-(33,554,432) if the export is that large, but MAY be something other
-than a power of 2.  For convenience, the server MAY advertise a
-maximum block size that is larger than the export size, although in
-that case, the client MUST treat the export size as the effective
-maximum block size (as further constrained by a nonzero offset).
+is willing to handle in one request.  If advertised, it MAY be
+something other than a power of 2, but MUST be either an integer

but, for "something other", only 0xffffffff is possible, so actually the first
sub sentence is useless, I think it may be dropped. (to be multiple of the
minimum block size involves to be a power of 2)

+multiple of the minimum block size or the value 0xffffffff for no
+inherent limit, MUST be at least as large as the smaller of the
+preferred block size or export size, and SHOULD be at least 2^20
+(1,048,576) if the export is that large.  For convenience, the server
+MAY advertise a maximum block size that is larger than the export
+size, although in that case, the client MUST treat the export size as
+the effective maximum block size (as further constrained by a nonzero
+offset).

So here, only 32M -> 1M for the minimum and "MAY be something other than a power of 2" moved to the top. It's better, but I think it should be dropped at all.



  Where a transmission request can have a nonzero *offset* and/or
  *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
@@ -804,9 +808,18 @@ those requests, the client MUST NOT use a *length* larger than any
  advertised maximum block size or which, when added to *offset*, would
  exceed the export size.  The server SHOULD report an `EINVAL` error if
  the client's request is not aligned to advertised minimum block size
-boundaries, or is larger than the advertised maximum block size,
-although the server MAY instead initiate a hard disconnect if a large
-*length* could be deemed as a denial of service attack.
+boundaries, or is larger than the advertised maximum block size.

and what about requests which exceed the export size?

+Notwithstanding any maximum block size advertised, either the server
+or the client MAY initiate a hard disconnect if the payload of an
+`NBD_CMD_WRITE` request or `NBD_CMD_READ` reply would be large enough
+to be deemed a denial of service attack; however, for maximum
+portability, any length less than 2^25 (33,554,432) bytes SHOULD NOT
+be considered a denial of service attack (even if the advertised
+maximum block size is smaller).  For all other commands, where the
+*length* is not reflected in the payload (such as `NBD_CMD_TRIM` or
+`NBD_CMD_WRITE_ZEROES`), a server SHOULD merely fail the command with
+an `EINVAL` error for a client that exceeds the maximum block size,
+rather than initiating a hard disconnect.

about other commands: structured reply to BLOCK_STATUS may be near to 4G, so we'll want to change this paragraph a bit, when we merge BLOCK_STATUS.


  ## Values


Thank you for understandable rewordings!

--
Best regards,
Vladimir


Reply to: