[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [RFC PATCH] doc: Add new NBD_FLAG_BLOCK_SIZE extension



On 12 Apr 2016, at 05:04, Eric Blake <eblake@...696...> wrote:

> Existing NBD servers often have limitations, such as requiring
> actions to be aligned to block sizes or limiting maximum
> transactions to avoid denial of service attacks; for example,
> qemu's NBD server refuses any transaction larger than 32M.  But
> to date, clients have to learn these limitations via out-of-band
> means.

I fully support solving this.

> This adds a new negotiation flag to let client and server agree
> to alter the response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO to
> advertise the server's limitations, as well as documenting sane
> defaults that a client should follow if the information was not
> documented.
> 
> This is done as a global flag rather than a new NBD_OPT, for a
> couple of reasons.  First, block size information can be
> implemented even without support for NBD_FLAG_FIXED_NEWSTYLE,
> because of the reserved block of zeroes sent in response to
> NBD_CMD_EXPORT_NAME.

This is a bad reason. Clients & servers without support
for NBD_FLAG_FIXED_NEWSTYLE should be burned with fire.
We should do nothing to help them beyond help them upgrade
to sanity (I'm serious).

>  Second, negotiating the option changes
> the NBD_REP_SERVER message size sent by the server for
> NBD_OPT_INFO; although this reply is structured and the length
> will be reliable, it would still be awkward for out-of-order
> option replies from the server to ping-pong in size within a
> single session.  Making the negotiation part of the initial
> global flag handshake means the rest of the session has
> consistent message sizing.

I don't think this is a great reason either. The INFO
extension is still experimental. I would prefer to fix the
INFO statement to transmit information about the export.

Perhaps instead of the name (before the other bit that looks
like NBD_OPT_EXPORT_NAME's reply), we should *now* specify
a structured selection of export information, e.g.

0000: 32 bits: information element length
0004: 32 bits: information element type
0008 ... length: information element data

We can then put what we like in there, and we have a future-proof
way of sending mount information.

The name then can be the first of these, i.e.

Element type 0001, Data: length of name (32 bits, followed by name).

We can then transmit the other data as an element type.

> 
> Signed-off-by: Eric Blake <eblake@...696...>
> ---
> 
> Perhaps I should try harder to make the an experimental extension,
> rather than directly placing it into the normative section.  But
> I wanted to first get it out for an initial review on whether the
> ideas presented make sense.
> 
> 
> Requires these patches to be applied first:
> [AB] [PATCHv8] Improve documentation for TLS
> [AB*] [PATCHv6] docs/proto.md: Clarify SHOULD / MUST / MAY etc
> [EB] [PATCHv2] doc: In STRUCTURED_REPLY, make error types easy to recognize
> 
> doc/proto.md | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 125 insertions(+), 22 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 06a275d..f721db9 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -143,12 +143,24 @@ understand.
> 
> If the value of the option field is `NBD_OPT_EXPORT_NAME` and the server
> is willing to allow the export, the server replies with information
> -about the used export:
> +about the used export.  The amount of information sent is determined
> +by what client flags were succesfully negotiated (134 bytes if
> +`NBD_FLAG_C_NO_ZEROES` is clear, otherwise 22 bytes if
> +`NBD_FLAG_C_BLOCK_SIZE` is set, otherwise 10 bytes).
> 
> S: 64 bits, size of the export in bytes (unsigned)  
> S: 16 bits, transmission flags  
> -S: 124 bytes, zeroes (reserved) (unless `NBD_FLAG_C_NO_ZEROES` was
> -   negotiated by the client)  
> +S: 32 bits, minimum block size (valid if `NBD_FLAG_C_BLOCK_SIZE`
> +   was negotiated, otherwise omitted if `NBD_FLAG_C_NO_ZEROES` was
> +   negotiated, otherwise zero)  
> +S: 32 bits, preferred block size (valid if `NBD_FLAG_C_BLOCK_SIZE`
> +   was negotiated, otherwise omitted if `NBD_FLAG_C_NO_ZEROES` was
> +   negotiated, otherwise zero)  
> +S: 32 bits, maximum block size (valid if `NBD_FLAG_C_BLOCK_SIZE`
> +   was negotiated, otherwise omitted if `NBD_FLAG_C_NO_ZEROES` was
> +   negotiated, otherwise zero)  
> +S: 112 bytes, zeroes (reserved) (omitted `NBD_FLAG_C_NO_ZEROES` was
> +   negotiated)  
> 
> If the server is unwilling to allow the export, it SHOULD close the
> connection.
> @@ -162,6 +174,40 @@ bit, signalling that an extra flag field will follow, to which the
> client will have to reply with a flag field of its own before the extra
> flags are sent. This is not yet implemented.

All the above I wouldn't do (for the reasons above) :-)

> +The three block size fields represent constraints that the client
> +SHOULD obey during transmission phase, as further detailed in
> +individual transmission requests.  If `NBD_FLAG_C_BLOCK_SIZE` was not
> +negotiated, the server MUST use zero for all three fields (or omit
> +them, when `NBD_FLAG_C_NO_ZEROES` is negotiated), and the client
> +SHOULD assume a minimum block size of 512 bytes, a maximum block size
> +of 33554432 (32M) bytes (unless the export size is smaller), and no
> +hint on a preferred block size.  Otherwise, the three values MUST each
> +be non-zero, with the preferred size at least as large as the minimum,
> +and the maximum at least as large as the preferred size, and the
> +export size at least as large as the maximum.

It seems a bit harsh that I can't specify maximum block size as -1
to mean 'send what you like' given that's current behaviour.

> +
> +The minimum block size MUST be a power of 2, and represents the
> +smallest addressable length and alignment within the export, even if
> +writing a block that small requires the server to use a less-efficient
> +read-modify-write action.  This value MAY be as small as 1 if the
> +export is backed by a regular file, although the values of 512 or 4096
> +are more typical for an export backed by a block device.  When block
> +sizes are negotiated, the server MUST return an export size that is an
> +integer multiple of the minimum block size.

I think you should make it clear that ALL requests MUST have an
offset AND a length which are multiples of the minimum blocksize.
Ah, you handle this below, but it might be worth saying here
what this means.

+1 on export size.

Should this not be larger than 32M?

> +The preferred block size MUST be an integer multiple of the minimum
> +block size, SHOULD be either a power of 2 or equal to the export size,
> +and MUST NOT be larger than 33554432 (32M) nor the export size.

See below re export size.

>  It
> +represents the minimal length and alignment required for optimal I/O
> +access; a typical size might be 65536.  The export size MAY be other
> +than an integer multiple of the preferred size.
> +
> +The maximum block size represents the largest length that the server
> +is willing to handle in a single transaction.  It MUST be an integer
> +multiple of the minimum block size, MUST be at least the smaller of
> +33554432 (32M) or the export size, and MUST NOT be larger than the
> +export size.

Wording is a bit difficult. I think better (modulo the -1 thing):
"it MUST be at least 32MB".

Obviously they can't send offsets or lengths greater than the export
size, but that's handled elsewhere. In practical terms it's
easier to get this information from the driver, and leave the export
size out of it.

IE I'd prefer what happened is the driver got the native export size
(length of file or whatever), and adjusted THAT with its own
minimum block size (rounded it down to the next multiple), then passed
through that (rounded) export size, together with the minimum, maximum
and preferred block sizes (each unaltered by the export size).

> +
> #### Fixed newstyle negotiation
> 
> Unfortunately, due to a mistake, the server would immediately close the
> @@ -600,8 +646,15 @@ the first magic number.
> - bit 0, `NBD_FLAG_FIXED_NEWSTYLE`; MUST be set by servers that
>   support the fixed newstyle protocol
> - bit 1, `NBD_FLAG_NO_ZEROES`; if set, and if the client replies with
> -  `NBD_FLAG_C_NO_ZEROES` in the client flags field, the server MUST NOT
> -  send the 124 bytes of zero at the end of the negotiation.
> +  `NBD_FLAG_C_NO_ZEROES` in the client flags field, the server MUST
> +  NOT send trailing bytes of zero in response to `NBD_OPT_EXPORT_NAME`
> +  at the end of the negotiation (omitting either 112 or 124 bytes,
> +  depending on `NBD_FLAG_C_BLOCK_SIZE`).
> +- bit 2, `NBD_FLAG_BLOCK_SIZE`; if set, and if the client replies with
> +  `NBD_FLAG_C_BLOCK_SIZE` in the client flags field, the server MUST
> +  advertize non-zero block sizes in response to `NBD_OPT_EXPORT_NAME`,
> +  `NBD_OPT_INFO`, and `NBD_OPT_GO`, and clients MUST obey these
> +  constraints during the transmission phase.
> 
> The server MUST NOT set any other flags, and SHOULD NOT change behaviour
> unless the client responds with a corresponding flag.  The server MUST
> @@ -652,8 +705,12 @@ receiving the handshake flags from the server.
>   fixed newstyle from clients that didn't set this bit, but relying on
>   this isn't recommended.
> - bit 1, `NBD_FLAG_C_NO_ZEROES`; MUST NOT be set if the server did not
> -  set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 124
> -  bytes of zeroes at the end of the negotiation.
> +  set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 112
> +  or 124 bytes of zeroes after `NBD_OPT_EXPORT_NAME`.
> +- bit 2, `NBD_FLAG_C_BLOCK_SIZE`; MUST NOT be set if the server did not
> +  set `NBD_FLAG_BLOCK_SIZE`. If set, the server MUST send block size
> +  information as part of a successful reply to `NBD_OPT_EXPORT_NAME`,
> +  `NBD_OPT_INFO`, and `NBD_OPT_GO`.
> 
> Clients MUST NOT set any other flags; the server MUST drop the
> connection if the client sets an unknown flag, or a flag that does
> @@ -848,6 +905,14 @@ The following request types exist:
>     of data, read from *offset* bytes into the file, unless an error
>     condition has occurred.
> 
> +    If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`, it MUST ensure
> +    that *offset* and *length* are integer multiples of the minimum
> +    block size, and SHOULD ensure that they are integer multiples of
> +    the preferred block size where possible.  The client SHOULD NOT
> +    request a length larger than the maximum block size.

MUST NOT if this is a maximum block size, else what's the point?

I think I would handle this in one place - just say 'on all requests
that specify a non-zero offset or length, ....'

>  The server
> +    SHOULD report an `EINVAL` error if the client does not honor block
> +    sizes.
> +
>     If an error occurs, the server SHOULD set the appropriate error
>     code in the error field. The server MUST then either close the
>     connection, or send *length* bytes of data (these bytes MAY be
> @@ -871,6 +936,14 @@ The following request types exist:
>     data to be written. The client MUST follow the request header with
>     *length* number of bytes to be written to the device.
> 
> +    If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`, it MUST ensure
> +    that *offset* and *length* are integer multiples of the minimum
> +    block size, and SHOULD ensure that they are integer multiples of
> +    the preferred block size where possible.  The client SHOULD NOT
> +    request a length larger than the maximum block size.  The server
> +    SHOULD report an `EINVAL` error if the client does not honor block
> +    sizes.
> +
>     The server MUST write the data to disk, and then send the reply
>     message. The server MAY send the reply message before the data has
>     reached permanent storage.
> @@ -906,6 +979,14 @@ The following request types exist:
>     longer needed. A server MAY discard len bytes starting at offset, but
>     is not required to.
> 
> +    If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`, it MUST ensure
> +    that *offset* and *length* are integer multiples of the minimum
> +    block size, and SHOULD ensure that they are integer multiples of
> +    the preferred block size where possible.  The client SHOULD NOT
> +    request a length larger than the maximum block size.  The server
> +    SHOULD report an `EINVAL` error if the client does not honor block
> +    sizes.
> +
>     After issuing this command, a client MUST NOT make any assumptions
>     about the contents of the export affected by this command, until
>     overwriting it again with `NBD_CMD_WRITE` or `NBD_CMD_WRITE_ZEROES`.
> @@ -952,11 +1033,13 @@ The following error values are defined:
>   experimental `STRUCTURED_REPLY` extension; see below.
> 
> The server SHOULD return `ENOSPC` if it receives a write request
> -including one or more sectors beyond the size of the device.  It SHOULD
> -return `EINVAL` if it receives a read or trim request including one or
> -more sectors beyond the size of the device.  It also SHOULD map the
> -`EDQUOT` and `EFBIG` errors to `ENOSPC`.  Finally, it SHOULD return
> -`EPERM` if it receives a write or trim request on a read-only export.
> +including one or more sectors beyond the size of the device.  It
> +SHOULD return `EINVAL` if it receives a read or trim request including
> +one or more sectors beyond the size of the device, or if a read or
> +write request does not comply with advertised minimum and maximum
> +block sizes.  It also SHOULD map the `EDQUOT` and `EFBIG` errors to
> +`ENOSPC`.  Finally, it SHOULD return `EPERM` if it receives a write or
> +trim request on a read-only export.
> 
> The server SHOULD return `EINVAL` if it receives an unknown command.
> 
> @@ -1042,7 +1125,9 @@ Therefore these commands share common documentation.
>     interpretation than the default (so as to provide additional
>     binary information normally sent in reply to `NBD_OPT_EXPORT_NAME`,
>     in place of the default free-form string). The option reply length
> -    MUST be *length of name* + 14, and the option data has the following layout:
> +    MUST be *length of name* + 14 if `NBD_FLAG_C_BLOCK_SIZE` was not
> +    negotiated, or *length of name* + 26 if it was.  The option data
> +    has the following layout:
> 
>     - 32 bits, length of name (unsigned)  
>     - String: name of the export. This name MAY be different from the one
> @@ -1050,7 +1135,10 @@ Therefore these commands share common documentation.
>       server has multiple alternate names for a single export, or a
>       default export was specified.  
>     - 64 bits, size of the export in bytes (unsigned)  
> -    - 16 bits, transmission flags.  
> +    - 16 bits, transmission flags  
> +    - 32 bits, minimum block size (only if `NBD_FLAG_C_BLOCK_SIZE`)  
> +    - 32 bits, preferred block size (only if `NBD_FLAG_C_BLOCK_SIZE`)  
> +    - 32 bits, maximum block size (only if `NBD_FLAG_C_BLOCK_SIZE`)  
> 
>     If there are no intervening option requests between a successful
>     `NBD_OPT_INFO` (that is, one that replied with `NBD_REP_SERVER`)
> @@ -1095,6 +1183,14 @@ command flag.
>     A write request with no payload. *Offset* and *length* define the location
>     and amount of data to be zeroed.
> 
> +    If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`, it MUST ensure
> +    that *offset* and *length* are integer multiples of the minimum
> +    block size, and SHOULD ensure that they are integer multiples of
> +    the preferred block size where possible.  The client SHOULD NOT
> +    request a length larger than the maximum block size.  The server
> +    SHOULD report an `EINVAL` error if the client does not honor block
> +    sizes.
> +
>     The server MUST zero out the data on disk, and then send the reply
>     message. The server MAY send the reply message before the data has
>     reached permanent storage.
> @@ -1347,8 +1443,9 @@ error, and alters the reply to the `NBD_CMD_READ` request.
>     The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a
>     client has requested `NBD_CMD_FLAG_DF` for a length that is too
>     large to read without fragmentation.  The server MUST NOT return
> -    this error if the read request did not exceed 65,536 bytes, and
> -    SHOULD NOT return this error if `NBD_CMD_FLAG_DF` is not set.
> +    this error if the read request did not exceed the smaller of 65,536
> +    bytes or of the negotiated preferred block size, and SHOULD NOT
> +    return this error if `NBD_CMD_FLAG_DF` is not set.
> 
> * `NBD_CMD_READ`
> 
> @@ -1371,12 +1468,18 @@ error, and alters the reply to the `NBD_CMD_READ` request.
>     overhead, the server SHOULD use chunks with lengths and offsets as
>     an integer multiple of 512 bytes, where possible (the first and
>     last chunk of an unaligned read being the most obvious places for
> -    an exception).  The server MUST NOT send content chunks that
> -    overlap with any earlier content or error chunk, and MUST NOT send
> -    chunks that describe data outside the offset and length of the
> -    request, but MAY send the content chunks in any order (the client
> -    MUST reassemble content chunks into the correct order), and MAY
> -    send additional content chunks even after reporting an error chunk.
> +    an exception).  If the client negotiated `NBD_FLAG_C_BLOCK_SIZE`,
> +    the server MUST ensure that every chunk has a length and offset
> +    are an integer multiple of the advertised minimum block size, and
> +    SHOULD use chunks with multiples of the preferred block size where
> +    possible.
> +
> +    The server MUST NOT send content chunks that overlap with any
> +    earlier content or error chunk, and MUST NOT send chunks that
> +    describe data outside the offset and length of the request, but
> +    MAY send the content chunks in any order (the client MUST
> +    reassemble content chunks into the correct order), and MAY send
> +    additional content chunks even after reporting an error chunk.
>     Note that a request for more than 2^32 - 8 bytes MUST be split
>     into at least two chunks, so as not to overflow the length field
>     of a reply while still allowing space for the offset of each
> -- 
> 2.5.5
> 
> 
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> _______________________________________________
> Nbd-general mailing list
> Nbd-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nbd-general
> 

-- 
Alex Bligh







Reply to: