On 04/29/2016 08:29 AM, Wouter Verhelst wrote: >> You can't send NBD_REP_ERR_BLOCK_SIZE_REQD in response to an NBD_OPT_INFO >> if it's asked for NBD_INFO_BLOCK_SIZE. >> >> If it has not asked for NBD_INFO_BLOCK_SIZE it is legitimate to error >> the NBD_OPT_INFO with NBD_REP_ERR_BLOCK_SIZE_REQD so that the client knows >> that if sends an NBD_OPT_GO with the same parameters it would get that >> error, and hence it should either ask for block size constraints or >> give up. > > Oh, right. I hadn't considered sending an ERR_BLOCK_SIZE_REQD on an INFO > request without a request for block sizes (after all, it's just an > information request, the fact that you don't need information on > everything doesn't mean you'll break things), but I suppose it makes > sense to do that. More importantly, if we _don't_ fail the NBD_OPT_INFO, then we _can't_ fail the NBD_OPT_GO. Nothing says that NBD_OPT_GO has to have the same failure and/or information as a failed NBD_OPT_INFO with the same parameters, nor that NBD_OPT_GO must not succeed if NBD_OPT_INFO failed; but we DO want to make sure that if NBD_OPT_GO is going to fail, then NBD_OPT_INFO should also fail in the sanest way possible. > > It might make sense for such a server to still send all the information > that the client *did* ask for in the reply, it would just send an error > along as well to signal that more is going to be needed, but I don't > suppose that's critical. Yes, a good server should basically reply with everything that it normally would on success, and then just switch NBD_REP_ACK to NBD_REP_ERR_BLOCK_SIZE_REQD as the last message. Here's what I'm planning on doing in my next qemu spin: The server can always do a minimum block size of 1 (it already has code to do a read-modify-write, when needed), so it will never disconnect a client that uses NBD_OPT_EXPORT_NAME, nor will such a client ever get an EINVAL for an unaligned read. However, there are some files that are more efficient with a block size of 1 (anything on the file system) than others (an actual block device, where 512 or 4096 is more typical). So on a per-export basis, the server will prefer to advertise a minimum block size that matches the type of file it is serving, where possible. It works out to these four cases: If the client calls NBD_OPT_INFO without NBD_INFO_BLOCK_SIZE, the server will reply with all information it has, and advertise the block size of the actual file. If the block size is 1, the server will conclude with NBD_REP_ACK; if the block size is > 1, the server will conclude with NBD_REP_ERR_BLOCK_SIZE_REQD. If the client calls NBD_OPT_INFO with NBD_INFO_BLOCK_SIZE, the server will reply with all information it has, and advertise the block size of the actual file. It will then conclude with NBD_REP_ACK. If the client calls NBD_OPT_GO without NBD_INFO_BLOCK_SIZE, the server will reply with all information it has, except that the minimum block size will be 1, then conclude with NBD_REP_ACK. That way, the client cannot send an unaligned request, and the server doesn't have to worry about reporting EINVAL. Note that for a file with native minimum block size > 1, this is a success reply even when the corresponding NBD_OPT_INFO with the same parameters would have failed, and with different information - but the way we worded this, _it is okay_. We have no requirement on NBD_OPT_GO due to a failed NBD_OPT_INFO, only due to a successful one. If the client calls NBD_OPT_GO with NBD_INFO_BLOCK_SIZE, the server will reply with all information it has, including correct minimum size, and conclude with NBD_REP_ACK. If the client then sends an unaligned request, it was the client that violated the protocol, so all bets are now off (the server will happen to honor the request, rather than disconnect or fail with the suggested EINVAL, but the client was out-of-spec so it shouldn't be relying on any particular server behavior, whether success or a particular error). Let me know if any of the above reasoning is wrong, or if we should try harder to document this in the spec to make it clear to server implementors how they can be portable to the largest number of clients. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature