[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCHv6] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO



On 04/29/2016 08:29 AM, Wouter Verhelst wrote:

>> You can't send NBD_REP_ERR_BLOCK_SIZE_REQD in response to an NBD_OPT_INFO
>> if it's asked for NBD_INFO_BLOCK_SIZE.
>>
>> If it has not asked for NBD_INFO_BLOCK_SIZE it is legitimate to error
>> the NBD_OPT_INFO with NBD_REP_ERR_BLOCK_SIZE_REQD so that the client knows
>> that if sends an NBD_OPT_GO with the same parameters it would get that
>> error, and hence it should either ask for block size constraints or
>> give up.
> 
> Oh, right. I hadn't considered sending an ERR_BLOCK_SIZE_REQD on an INFO
> request without a request for block sizes (after all, it's just an
> information request, the fact that you don't need information on
> everything doesn't mean you'll break things), but I suppose it makes
> sense to do that.

More importantly, if we _don't_ fail the NBD_OPT_INFO, then we _can't_
fail the NBD_OPT_GO. Nothing says that NBD_OPT_GO has to have the same
failure and/or information as a failed NBD_OPT_INFO with the same
parameters, nor that NBD_OPT_GO must not succeed if NBD_OPT_INFO failed;
but we DO want to make sure that if NBD_OPT_GO is going to fail, then
NBD_OPT_INFO should also fail in the sanest way possible.

> 
> It might make sense for such a server to still send all the information
> that the client *did* ask for in the reply, it would just send an error
> along as well to signal that more is going to be needed, but I don't
> suppose that's critical.

Yes, a good server should basically reply with everything that it
normally would on success, and then just switch NBD_REP_ACK to
NBD_REP_ERR_BLOCK_SIZE_REQD as the last message.

Here's what I'm planning on doing in my next qemu spin:

The server can always do a minimum block size of 1 (it already has code
to do a read-modify-write, when needed), so it will never disconnect a
client that uses NBD_OPT_EXPORT_NAME, nor will such a client ever get an
EINVAL for an unaligned read.  However, there are some files that are
more efficient with a block size of 1 (anything on the file system) than
others (an actual block device, where 512 or 4096 is more typical).  So
on a per-export basis, the server will prefer to advertise a minimum
block size that matches the type of file it is serving, where possible.
 It works out to these four cases:

If the client calls NBD_OPT_INFO without NBD_INFO_BLOCK_SIZE, the server
will reply with all information it has, and advertise the block size of
the actual file. If the block size is 1, the server will conclude with
NBD_REP_ACK; if the block size is > 1, the server will conclude with
NBD_REP_ERR_BLOCK_SIZE_REQD.

If the client calls NBD_OPT_INFO with NBD_INFO_BLOCK_SIZE, the server
will reply with all information it has, and advertise the block size of
the actual file.  It will then conclude with NBD_REP_ACK.

If the client calls NBD_OPT_GO without NBD_INFO_BLOCK_SIZE, the server
will reply with all information it has, except that the minimum block
size will be 1, then conclude with NBD_REP_ACK.  That way, the client
cannot send an unaligned request, and the server doesn't have to worry
about reporting EINVAL.  Note that for a file with native minimum block
size > 1, this is a success reply even when the corresponding
NBD_OPT_INFO with the same parameters would have failed, and with
different information - but the way we worded this, _it is okay_.  We
have no requirement on NBD_OPT_GO due to a failed NBD_OPT_INFO, only due
to a successful one.

If the client calls NBD_OPT_GO with NBD_INFO_BLOCK_SIZE, the server will
reply with all information it has, including correct minimum size, and
conclude with NBD_REP_ACK.  If the client then sends an unaligned
request, it was the client that violated the protocol, so all bets are
now off (the server will happen to honor the request, rather than
disconnect or fail with the suggested EINVAL, but the client was
out-of-spec so it shouldn't be relying on any particular server
behavior, whether success or a particular error).

Let me know if any of the above reasoning is wrong, or if we should try
harder to document this in the spec to make it clear to server
implementors how they can be portable to the largest number of clients.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: