Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands

To: Eric Blake <eblake@redhat.com>, Wouter Verhelst <wouter@grep.be>
Cc: nbd@other.debian.org, qemu-block@nongnu.org, w@uter.be, nsoffer@redhat.com, den@openvz.org
Subject: Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date: Wed, 18 Mar 2020 16:10:50 +0300
Message-id: <[🔎] aa65119c-f46d-3b48-2f3c-1e69942ef283@virtuozzo.com>
In-reply-to: <[🔎] f4f52b87-5520-4b8c-1b1d-2e681e1ff514@redhat.com>
References: <20200206151547.5812-1-vsementsov@virtuozzo.com> <2f56c11f-3380-9b83-6873-2ec8f4c3d07d@virtuozzo.com> <[🔎] 20200317161859.GD27056@grep.be> <[🔎] fd6a5d69-36da-bb4f-b67d-44ef80577a88@virtuozzo.com> <[🔎] 20200318053337.GE27056@grep.be> <[🔎] 49e18291-7e7b-a363-c807-80d4cc589fd8@virtuozzo.com> <[🔎] 78efe37b-b54b-cc59-aec2-1ce39ec1676b@virtuozzo.com> <[🔎] 20200318080454.GH27056@grep.be> <[🔎] f4f52b87-5520-4b8c-1b1d-2e681e1ff514@redhat.com>

18.03.2020 15:22, Eric Blake wrote:

On 3/18/20 3:04 AM, Wouter Verhelst wrote:
On Wed, Mar 18, 2020 at 09:19:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
OK, understand. Reasonable thing, agreed. I'll resend.
Hmm. But we can't read in one go anyway, because we need to distinguish NBD_REQUEST_MAGIC
from NBD_EXTENDED_REQUEST_MAGIC..
Yes, that needs to happen at any rate, indeed. So the difference will be
two reads rather than three, instead of one read rather than two.

That's still an advantage.
Not much of one.  You're micro-optimizing the read()s, but in reality, the sender will likely send() the entire packet at once, at which point the data is in the kernel and the reads will succeed back-to-back, or you can even write the client to read into a buffer to minimize syscalls and then parse as much as needed out of the buffer.

You've got a LOT more overhead in the TCP packet and network transmission time than you do in deciding whether the server has to do 2 or 3 read()s per client message.

While it might be nice to design a message that doesn't need the server to do additional decision points mid-packet in determining how much packet remains, that should not be your driving factor. Even with current servers, that is already the case (the server has to decide mid-packet whether it is handling NBD_CMD_WRITE and thus has more data to read).
I think, that modern client will use mostly NBD_EXTENDED_REQUEST_MAGIC interface, so it will
be great to optimize it. But to read extended request in one go, we should make it
shorter than simple request, and it doesn't seem possible.

May be we should not support simple and extended requests together? May be better to force
using only extended requests if they are negotiated? Then we will read extended request in
on go, and we may just define it like
As extended requests already have to be negotiated, and no client nor server exists yet that supports them, we can indeed declare that on successful handshake of the new feature, a client may ONLY send extended requests. However, a server already has to handle both packet types (if the client doesn't request the feature) and a client already has to be able to send both packet types (for fallback when talking to a server that lacks the feature), so what does it buy us to require that when the feature is negotiated, only extended packets may be sent? I guess it boils down to whether the server implementation is simplified or made more complex, depending on whether we state that once negotiated both packet types are allowed (server must decide on a per-packet basis which type it is receiving) or whether only extended packets are allowed (server must now be more stateful in order to reject an ill-behaved client that sends wrong type). In fact, I argue that a server thatreplies to an extended packet even when an ill-behaved client that forgot to negotiate them is a reasonable server implementation (the client can't depend on that behavior, though).


Hmm, so the restriction doesn't help us, as we'll any way try to handle "wrong" type of the message.. Then, we, any way, will have two reads for extended request (if it is larger than simple request), as on first we should understand is it extended or not. So, I again don't see any benefit in forcing offset and length to be in the header, it's just less portable for the future.

So, if we go with my proposal as is, how will it work?

Smallest extended request is of 20 bytes length. Simple request is 28 bytes..

So:

1. read 20 bytes

2. if extended, read the tail defined by length
   else, read the tail, corresponding to simple request

- we always have two reads.

===

If we extened extended request headers by some fields, we can only optimize simple request, so it will be

1. read 28 bytes

2. if simple WRITE read corresponding tail
   if another simple request - we are done
   if extended, read corresponding tail. - For this, payload_length field should be placed in first 28 bytes, so it can't follow offset and bytes fields.

So, it will look like

C: 32 bits, 0x23876289, magic (`NBD_EXTENDED_REQUEST_MAGIC`)
C: 16 bits, flags
C: 16 bits, type
C: 64 bits, handle
C: 32 bits, length of payload (unsigned)
C: 64 bits, offset
C: 64 bits, length of requested region
C: *length* bytes of payload data (if *length* is nonzero)

which is again, a bit strange, and the only benefit is one read instead of two on simple request handling (except for WRITE), when most of requests will be extended anyway.

So, I'm for first scheme and my original proposal, as it is more flexible for future extensions.. Hmm about non-offset-size commands:
I can imagine Qemu extension, which will export qapi block-layer interface through NBD, why not?


C: 32 bits, 0x23876289, magic (`NBD_EXTENDED_REQUEST_MAGIC`)
C: 16 bits, flags
C: 16 bits, type
C: 64 bits, handle
C: 32 bits, length of payload (unsigned), MUST be greater or equal to 16
C: *length* bytes of payload data (if *length* is nonzero)

- so, we'll just read 36 bytes in one go, and then additional payload, if length
is more than 16.


That is, certainly, also an option, although I would define length of
payload to not include the offset and length bytes.


Agreed - mixing header length into the length of the payload field makes life a bit more confusing.  I'd rather have the length field be 0 when sending a minimum-sized extended request packet.




--
Best regards,
Vladimir

Reply to:

References:
- Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
  - From: Wouter Verhelst <wouter@grep.be>
- Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
- Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
  - From: Wouter Verhelst <wouter@grep.be>
- Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
- Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
  - From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
- Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
  - From: Wouter Verhelst <wouter@grep.be>
- Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
  - From: Eric Blake <eblake@redhat.com>

Prev by Date: Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
Next by Date: Re: [PATCH] nbd: make starting request more reasonable
Previous by thread: Re: [RFC v2] nbd/proto: introduce extended request and 64bit commands
Next by thread: memory leak in nbd_add_socket
Index(es):
- Date
- Thread