[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH] doc: Clarify maximum size limits



Eric,

On 11 May 2016, at 21:21, Eric Blake <eblake@...696...> wrote:

> Requiring a server to support 32M as its maximum block size
> may be a bit large for some setups; rewrite things to make it
> obvious that a server may advertise a maximum as small as 1M,
> but that without an advertisement, there are existing clients
> (hello, kernel NBD module) that expect 32M to work.

Do we know the kernel never asks for more than 32M?

Do we know what the maximum size of a TRIM sent by the
kernel is?

> Add some clarity that servers SHOULD NOT treat anything
> smaller than 32M as a denial of service (and in contrast to
> the recent patch for a 16k limit during option haggling),
> and that denial-of-service limits may differ according to
> command.

That principle is fair enough.

> Also add a way for servers to advertise on a per-command basis
> when it is willing to accept larger limits because there is no
> payload involved; the two obvious commands are NBD_CMD_TRIM and
> NBD_CMD_WRITE_ZEROES, which now have NBD_INFO_TRIM_SIZE and
> NBD_INFO_ZERO_SIZE.  These new limits are listed separately
> from NBD_INFO_BLOCK_SIZE since both commands are optional and
> need not be implemented by a minimal server; but letting the
> server advertise them makes sense since at least scsi devices
> natively have three different size constraints for read/write,
> unmap, and writesame.  A server MAY honor larger trim requests
> even without advertising, but a client cannot rely on that
> working unless the additional sizes are advertised.

I'd be interested what Wouter thinks here. I know SCSI does
this. However, I wonder whether or not this is not overengineering
and excess complexity.

I think here we're in danger of pushing a pile of work onto
the server which is in all fairness the client's responsibility.

> The full text for NBD_INFO_ZERO_SIZE will be written later once
> either that extension (or this one) is promoted to stable; it can
> copy the pattern of NBD_INFO_TRIM_SIZE, except that where
> CMD_TRIM can round the request (and only actually trim a subset
> of the client request), CMD_WRITE_ZEROES must write actual zeroes
> over the entire request (but if it can punch holes, only a
> subset of the client request need result in holes).

One problem with branches and extensions :-)

> Signed-off-by: Eric Blake <eblake@...696...>
> ---
> 
> For the extension-info branch, and builds on the earlier
> patch sent for the master branch about handshake limits
> 
> doc/proto.md | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 79 insertions(+), 13 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 5692be5..bf29b3a 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -686,7 +686,9 @@ of 2^9 (512), a preferred block size of 2^16 (65,536), and a maximum
> block size of 2^25 (33,554,432)).  Notwithstanding any maximum block
> size advertised, either the server or the client MAY initiate a hard
> disconnect if the size of a request or a reply is large enough to be
> -deemed a denial of service attack.
> +deemed a denial of service attack.  However, for maximum portability,
> +any length less than 2^25 (33,554,432) bytes SHOULD NOT be considered
> +a denial of service attack.

OK subject to knowing whether 2^25 bytes is in fact the maximum the
kernel and other clients say.

> 
> The minimum block size represents the smallest addressable length and
> alignment within the export, although writing to an area that small
> @@ -710,29 +712,47 @@ preferred block size for that export.  The server MAY advertise an
> export size that is not an integer multiple of the preferred block
> size.
> 
> -The maximum block size represents the maximum length that the server
> -is willing to handle in one request.  If advertised, it MUST be either
> +The maximum block size represents the maximum payload length that the
> +server is willing to handle in one request (whether the payload is
> +from the client as in `NBD_CMD_WRITE` or from the server as in
> +`NBD_CMD_READ`).

So you're making this a payload length rather than a request length
maximum? If we go down this route I'd rather make this a request length
maximum, then have a list of 'exceptions' for different commands.
After all, at least 2 servers I know implement (or have a patch
to implement) WRITE_ZEROES simply by allocating the memory as
zeroes and then doing a normal write.

>  If advertised, it SHOULD be either
> an integer multiple of the minimum block size or the value 0xffffffff
> for no inherent limit, MUST be at least as large as the smaller of the
> -preferred block size or export size, and SHOULD be at least 2^25
> -(33,554,432) if the export is that large, but MAY be something other
> +preferred block size or export size, and SHOULD be at least 2^20
> +(1,048,576) if the export is that large, but MAY be something other

ok subject to discussion on the value.

> than a power of 2.  For convenience, the server MAY advertise a
> maximum block size that is larger than the export size, although in
> that case, the client MUST treat the export size as the effective
> maximum block size (as further constrained by a non-zero offset).
> 
> +Some commands (such as `NBD_CMD_TRIM`) allow the client to specify
> +*length* but do not involve a payload; for these commands, the client
> +and server may negotiate additional information about larger size
> +limits for those commands (such as `NBD_INFO_TRIM_SIZE`).  For these
> +commands, a client SHOULD NOT use a *length* larger than the
> +negotiated maximum block size from `NBD_INFO_BLOCK_SIZE` unless the
> +additional information was successfully negotiated.

So again, I think I'd just say (if we're going to down this route)
that it should be possible to send a list of:

     - 16 bits, `NBD_INFO_CMD_SIZE`
     - 16 bits, command number to which maximum / minimum apply  
     - 32 bits, minimum command size  
     - 32 bits, maximum command size  

and then try and touch the rest of the document as little as possible.

I remain unconvinced about the need at all though!

-- 
Alex Bligh







Reply to: