[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH v3 2/2] doc: Add details on block sizes



On 14 Apr 2016, at 01:37, Eric Blake <eblake@...696...> wrote:
> 
> +## Block sizes
> +
> +During transmission phase, several operations are constrained by the
> +export size sent in reply to `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`

or NBD_OPT_INFO?

> , as
> +well as by three block sizes defined here (minimum, preferred, and
> +maximum).  During the handshake phase, a client MAY announce its
> +intention to honor server block sizes via the experimental
> +`BLOCK_SIZE` extension; see below.  A client that is worried about
> +block size constraints SHOULD use `NBD_OPT_GO` rather than
> +`NBD_OPT_EXPORT_NAME`.  A server SHOULD advertise the block size
> +contraints during handshake phase via the experimental `INFO`
> +extension; see below.

I agree, but this is a bit circular, given this is an experimental
extension and thus there is no RFC2119 obligation to implement it.
I think we should leave this in though as a reminded for us to
deal with it when incorporated into the main standard.

>  A server and client MAY agree on block sizes
> +via out of band means.  Since a server cannot advertise block sizes
> +for clients

s/for/to/

> that use `NBD_OPT_EXPORT_NAME`, a server that wants to
> +enforce block sizes other than the defaults specified here MUST
> +support the experimental `INFO` extension, and MAY refuse to connect
> +to a client

"MAY refuse to allow a connection from a client" (clients connect to
servers, not vice versa), but see below.

> that does not use `NBD_OPT_GO`, but SHOULD NOT refuse to
> +connect to a client that does not use `NBD_OPT_BLOCK_SIZE`.

Really? If I've written my O_DIRECT only server, and a client
connects to it and says it's not going to respect my minimum
block size, I should permit a connection and fail operations
that I can't perform?

I think actually both of these are wrong. In both instances the
server should permit the *connection* (the TCP connection) but
they should error going into transmission mode if they don't
like the fact the client doesn't support block sizes.

I don't see a necessity to error if it connects via NBD_OPT_EXPORT_NAME
and the client has a minimum block size (this seems like a recipe
to prevent qemu-nbd and qemu connecting to each other), and you've
already said clients/servers MAY exchange this information out of band,
so I think disconnecting solely because the client has used
NBD_OPT_EXPORT_NAME is probably something we should not encourage
with a specific 'MAY'.

And in any case "MAY refuse to allow a connection from a client" again.

> +
> +If block sizes have not been advertised or agreed on externally, then
> +a client SHOULD assume a default minimum block size of 1, a preferred
> +block size of 4,096, and no inherent maximum block size; but MAY
> +choose to operate as if other block sizes had been specified (for
> +example, by using a minimum block size of 512, a preferred block size
> +of 65,536, and a maximum block size of 3,355,442 (32M)).

I'd prefer that rather than 'but MAY' (which somewhat undoes the 'SHOULD')
we have "if the server chooses to operate ... [above text] ... then
a consequence will be that a client with which it has not negotiated
these different block sizes out of band may send requests (e.g. requests
smaller than 512 bytes or larger than 32M in this case) which the
server cannot service." - i.e. we should point out the consequences
of breaching the 'SHOULD' rather than encouraging it.

>  A server
> +that does not advertise block sizes (including via external agreement)
> +SHOULD NOT reject operations due to alignment constraints (that is, it
> +SHOULD treat the minimum block size as 1),

Hang on, we've already said it SHOULD assume a default minimum block
size of 1 ...

> and SHOULD have no inherent
> +maximum block size,

... and that

so I'd forget those two, but

> although it MAY still terminate the connection on
> +any request where the length is large enough to be deemed a denial of
> +service attack.

this phrase should remain, but should also apply to servers that
did explicitly specify an explicit maximum block size of infinity (-1).

Also "terminate the connection" -> "initiate a hard disconnect" (no
harm in using the new language even if the whole of the patch
has not been taken.

Also the DoS provision should apply to the client (a server could
send a pile of unwanted stuff).

How about deleting this whole sentence and saying.

"Nothwithstanding any maximum block size advertised, either
the server or the client MAY initiate a hard disconnect if
the size of a request or a reply is large enough to be deemed a denial
of service attack."

> A server that does advertise block sizes MUST be
> +prepared for clients that ignore the advertisement, by, at the very
> +least, sending appropriate errors, and MUST NOT corrupt data in that
> +case.

If we permit servers not avoid entering transmission mode with
clients that don't respect block sizes, and the server also
advertises block sizes, then if the client uses offsets / lengths
that don't respect the block size rules, this is violation of the
protocol and servers are already permitted to disconnect. The
problem is thus with clients that don't claim they will respect
block sizes.

I think rather we should say:

A server that advertises block sizes and [enters transmission
mode with a client that does not advertise support for block
sizes] MUST cleanly error commands that fall outside those blocksize
parameters and not corrupt data.

bit in [] optional I suppose.


> +The minimum block size represents the smallest addressable length and
> +alignment within the export, although writing to an area that small
> +may require the server to use a less-efficient read-modify-write
> +action.  If advertised, this value MUST be a power of 2, MUST NOT be
> +larger than 65,536, and MAY be as small as 1 for an export backed by a
> +regular file, although the values of 512 or 4,096 are more typical for
> +an export backed by a block device.  If a server advertises a minimum
> +block size, the advertised export size MUST be an integer multiple of
> +that block size.
> +
> +The preferred block size represents the minimum size at which aligned
> +requests will have efficient I/O, avoiding behaviour such as
> +read-modify-write.  If advertised, this MUST be a power of 2 at least
> +as large as the smaller of the minimum block size and 4,096, although
> +larger values (such as the minimum granularity of a hole) are also
> +appropriate.  The preferred block size MAY be larger than the export
> +size, in which case the client is unable to utilize the preferred
> +block size for that export.
> +
> +The maximum block size represents the maximum length that the server
> +is willing to handle in one request.  If advertised, it MUST be either
> +an integer multiple of the minimum block size or the value
> +4,294,967,295 (that is, (uint32_t)-1) for no inherent limit, MUST be
> +at least as large as the preferred block size, and SHOULD be at least
> +3,355,442 (32M), but MAY be something other than a power of 2.  For
> +convenience, the server MAY advertise a maximum block size that is
> +larger than the export size, although in that case, the client MUST
> +treat the export size as the effective maximum block size (as further
> +constrained by a non-zero offset).
> +
> +Where a transmission request can have a non-zero *offset* and/or
> +*length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
> +the client MUST ensure that *offset* and *length* are integer
> +multiples of any advertised minimum block size, and SHOULD use integer
> +multiples of any advertised preferred block size where possible.  For
> +those requests, the client SHOULD NOT use a *length* of zero,

I thought you wanted to ditch the zero length thing? Not sure what
it adds here given your example re trim of length 0. 0 is an integer
multiple of anything.

> +* `NBD_OPT_BLOCK_SIZE`
> +
> +    The client wishes to inform the server of its intent to obey block
> +    sizes.  The option request has no additional data.
> +
> +    The server MUST reply with `NBD_REP_ACK`, after which point the
> +    client SHOULD use `NBD_OPT_GO` rather than `NBD_OPT_EXPORT_NAME`,
> +    and the server SHOULD include `NBD_INFO_BLOCK_SIZE` in its reply.
> +    If successfully negotiated, and the server advertises block sizes,
> +    the client MUST NOT send unaligned requests.
> +
> +    For backwards compatibility, clients SHOULD be prepared to also
> +    handle `NBD_REP_ERR_UNSUP`, which means the server SHOULD NOT be
> +    advertising block sizes, and the client MAY assume defaults.

Could we not just use one of the client flags? We have 32 bits of
them (as opposed to 16 server flags). It's right at the start
of the negotiation, and we simply need to refer to whether
NBD_FLAG_C_BLOCKSIZE is set. It can be in the top 16 bits if we
don't want to use a flag which can match a server flag.

-- 
Alex Bligh







Reply to: