On 04/25/2016 07:50 AM, Alex Bligh wrote: > A server can determine whether a client supports block sizes by > the fact it uses NBD_OPT_INFO or NBD_OPT_GO, as these are part > of the same extension. Therefore there is no need for > NBD_OPT_BLOCK_SIZE, or for NBD_REP_ERR_BLOCK_SIZE_REQD. This > substantially simplifies the extension. > > Signed-off-by: Alex Bligh <alex@...872...> > --- > doc/proto.md | 76 ++++++++++++++++-------------------------------------------- > 1 file changed, 20 insertions(+), 56 deletions(-) > > Let's see what people think before I merge this. > > +If a client can honor server block size constraints (as set out below > +and under `NBD_INFO_BLOCK_SIZE`), it SHOULD announce this during the > +handshake phase by using `NBD_OPT_INFO`, and SHOULD use `NBD_OPT_GO` > +rather than `NBD_OPT_EXPORT_NAME`. In my qemu patches posted to date, I didn't bother to send NBD_OPT_INFO, but went straight to NBD_OPT_GO. I still think that a single NBD_OPT_GO is sufficient for the server to know that the client is new enough to understand and support block sizes, without having to require the client to send NBD_OPT_INFO. > A server with block size > +constraints other than the default SHOULD advertise the block size > +contraints during handshake phase via `NBD_INFO_BLOCK_SIZE` in response > +to `NBD_OPT_INFO` or `NBD_OPT_GO`. A server and client MAY agree on > +block sizes via out of band means. A server is entitled to assume > +that a client that issues `NBD_OPT_INFO` or `NBD_OPT_GO` will either > +support the blocksize constraints it has supplied using > +`NBD_INFO_BLOCK_SIZE`, or will not continue the session. The rest reads okay. It covers the case of a client that issues NBD_OPT_INFO to learn block sizes, but then falls back to NBD_OPT_EXPORT_NAME to start transmission phase (I don't know why anyone would write such a client, but it doesn't hurt for the server to support it). In my qemu implementation, one of the things I was up against was a question on what block sizes I should advertise to the client. I fixed a bug where qemu was mis-behaving on requests not aligned to 512 bytes, and in so doing, found out that with that fixed, qemu can always support a minimum alignment of 1. But the ability to advertise a larger minimum alignment (of 512, or of the size mandated by O_DIRECT, or whatever other reason) seems like a rather compelling optimization to be able to make, and I was debating about the presence or absence of NBD_OPT_BLOOCK_SIZE as the flag for whether to stick to an advertisement of a minimum size of 1 vs. 512. Under the simplification here, the result would be that qemu always advertises a minimum of 512 or larger for a client new enough to use NBD_OPT_GO, but always supports a minimum of 1 for clients that used NBD_OPT_EXPORT_NAME; which means I have to defer the choice of O_DIRECT during open() until after the client has requested to go into transmission phase. Qemu is able to probe the minimum alignment of an O_DIRECT file, but that requires the file to already be open()d. So that makes it a bit of a chicken-and-egg - how do you learn what sizes to advertise during NBD_OPT_INFO (which requires an open fd) if you don't want to open the fd until after knowing whether the client will use NBD_OPT_GO or NBD_OPT_EXPORT_NAME (since the decision on whether to use O_DIRECT during open depends on the client's choice of going into transmission phase). I'm not opposed to the simplification, just making sure we think through the ramifications. > -- `NBD_OPT_BLOCK_SIZE` (9) > - > - The client wishes to inform the server of its intent to obey block > - sizes. The option request has no additional data. > - > - Some servers are able to make optimizations, such as opening files > - with O_DIRECT, if they know that the client will obey a particular > - minimum block size, where it must fall back to safer but slower code > - if the client might send unaligned requests. This was the only mention of O_DIRECT; we probably ought to mention O_DIRECT in the "Block Sizes" section as a justification for why a server might want to advertise non-default sizes, if we get rid of the mention here. Otherwise, I like where this is headed in terms of simplification, and it's only a few minor tweaks to my existing qemu patches to update to the changes here. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature