[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] NBD_CMD_DISC



On 9 Apr 2016, at 12:32, Wouter Verhelst <w@...112...> wrote:

> On Sat, Apr 09, 2016 at 12:14:02PM +0100, Alex Bligh wrote:
>> If you're arguing the reply doesn't add anything, in a sense you're
>> right, but only in the same way NBD_CMD_DISC itself doesn't add anything,
>> as you *could* just permit straight disconnections without a command
>> at all - they would be just as safe/unsafe as with NBD_CMD_DISC
>> as with NBD_CMD_DISC as currently documented a legally operating
>> client can actually disconnect intending to send NBD_CMD_DISC but
>> with it not actually reach the server.
> 
> My point is, it's fairly easy to clarify that that is not legal. If you
> *intend* to send NBD_CMD_DISC, you should ensure that you've actually done so
> before dropping the connection!

I'm not sure how you actually do that with existing TLS libraries. I'm
not sure we should make writing clients harder.

>>> Clients can send FLUSH if they want to be sure.
>> 
>> Well they can, but we could make life easier for them,
> 
> I don't see how "add another command with almost-but-not-quite the same
> semantics" makes life easier for anyone...

Well, the idea was to deprecate NBD_CMD_DISC. But as I have now
discovered we can probably do that without a new command.

>>> The server may *also* want to know when a client wants to disconnect,
>>> for whatever reason.
>> 
>> Sure, it will know when the TCP stream is terminated.
> 
> So will the client?

Not sure I understand. The client knows it wants to disconnect
because it knows it wants to disconnect - that's the assumption.

After the client has disconnected (currently) it typically
exits. The problem is this can happen prior to the TLS
side completing.

If you mean "will the client also want to know if the server
disconnects", yes it will want to know that. If it receives
a disconnect other than in response to NBD_CMD_DISC then
it knows something bad has happened. What happens after
NBD_CMD_DISC has been sent is more complicated:

* If there are inflight requests (situation not currently
  prohibited) it does not know whether these are served
  successfully or even attempted.

* It does not know whether the NBD_CMD_DISC was actually
  received or if a disconnect happened to occur at the same
  time.

* If no TCP disconnect occurs, it does not know how long to
  wait for one.

So the client waiting for the server to disconnect is also
not a great option.

>> That is the only notification it currently gets *now* in some
>> circumstances, as the NBD_CMD_DISC gets lost (occasionally) as the
>> client has exited before its TLS buffer gets flushed.
> 
> If the command gets lost, clearly the client didn't do its job
> correctly...

If we make it hard for the client to do its job, we should not
be surprised if that happens.

>>>> If you're concerned about saving commands, the easy answer is this:
>>> 
>>> I'm not concerned about saving commands so much as about complicating
>>> things too much. NBD is (or at the very least, used to be ;-) a fairly
>>> simple protocol to implement. That's a feature, one which I would like
>>> to keep.
>> 
>> This is a reasonable design objective and one I am grateful for
>> having implemented 2 NBD servers and worked on the reference
>> implementation) :-)
>> 
>>> If there is a reason why we can't define that a server SHOULD behave in
>>> a particular way upon CMD_DISC, then fine, we can add a CLOSE.
>>> Otherwise, I think this is complicating matters for little benefit.
>> 
>> OK, well having thought about it I *still* think NBD_CMD_DISC as
>> currently written is broken. I didn't come up with the problem as
>> a pipedream, I came up with it because I couldn't help wondering
>> why my server was spitting warning messages on every fifth
>> 'qemu-img info' command saying the client has disconnected without
>> sending an NBD_CMD_DISC.
> 
> That sounds like a bug in qemu-img then, rather than a bug in the
> protocol?

I'd be happy to fix it if I knew how. I don't.

>> I think there are two possible courses of action.
>> 
>> 1. Introduce a disconnect with a reply (either NBD_CMD_CLOSE
>>   or some sort of flag to indicate NBD_CMD_DISC will send
>>   a reply so the client can wait for it and close the channel
>>   itself).
>> 
>> 2. Make it legal for the client to close the communication when
>>   there is nothing in flight, and after a flush (if the
>>   server supports it). Make this the preferred method for
>>   disconnection.
> 
> 3. Clarify that NBD_CMD_DISC is an *intent* to close the connection, but
> that whatever is outstanding should still be handled before doing so. If
> a client sends FLUSH and DISC in rapid succession, the server should
> handle the FLUSH before handling the disconnect.
> 
> That doesn't require any semantic changes (the client needs to wait for
> the FLUSH response anyway), and gets you the certainty you're asking
> for.

You mean that a server should wait for all outstanding requests
to complete before processing NBD_CMD_DISC? (and presumably
clients should send no more requests after sending NBD_CMD_DISC).

That's fine, but it doesn't fix the TLS buffer issue; that's
orthogonal issue (one which you would presumably address by
saying 'fix the client').

>> On reflection, I think (2) is far easier. It has the following
>> advantages:
>> 
>> * Server implementation is trivial as a client disconnect
>>  needs to be monitored anyway. If the server is multithreaded
>>  it can produce a warning if there are inflight commands, else
>>  not.
>> 
>> * Client implementation is trivial too - simply don't bother
>>  sending the NBD_CMD_DISC. It should send a FLUSH first (which
>>  it should be doing anyway probably), and it should wait for
>>  all inflight commands to complete (which it presumably should
>>  be doing anyway).
> 
> Downside: servers now need to distinguish between "connection drop with
> outstanding requests" and "connection drop with *no* outstanding
> requests", and perhaps do different clean up in the latter case but not
> the former, because the former is an unclean disconnect and the latter
> isn't.

In practice it currently needs to do that anyway. Firstly because
in order to process the NBD_CMD_DISC (per your suggestion above) it
needs to wait until all inflight requests are processed, so it must
already have something that determines whether or not there are
inflight requests. Secondly because as demonstrated there are
already clients out in the wild which disconnect when there are no
outstanding requests without reliably sending a NBD_CMD_DISC message.

-- 
Alex Bligh







Reply to: