[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Question about the expected behaviour of nbd-server for async ops



On Sun, May 29, 2011 at 09:53:44AM +0100, Alex Bligh wrote:
> --On 29 May 2011 08:50:38 +0200 Wouter Verhelst <w@...112...> wrote:
> >> Handles can be reused only once the command in question is completed.
> >>
> >> You may process commands out of order, and reply out of order,
> >> save that
> >> a) all write commands *completed* before you process a REQ_FLUSH
> >>    must be written to non-volatile storage prior to completing
> >>    that REQ_FLUSH (though apparently you should, if possible, make
> >>    this true for all write commands *received*, which is a stronger
> >>    condition) [Ignore this if you don't set SEND_REQ_FLUSH]
> >
> > We already implement that stronger condition, because writes are handled
> > in the way they are received.
> 
> Yes
> 
> > It shouldn't be too hard to implement when
> > disordered handling of requests is done, either: stop handling incoming
> > requests when you receive a flush request; flag all outstanding requests
> > so you know when the flush can be done (after which you can start
> > handling incoming requests again); and handle the flush when all flagged
> > requests have been handled.
> 
> Stopping handling requests is more than you need to do (it's not a
> barrier)

What I meant is "stop processing incoming requests until you flag them
as 'need to be flushed'". Obviously, once you've flagged them as such,
you can start processing incoming requests again, in parallel to the
flush request.

> and may be undesirable given how often you get a REQ_FLUSH
> with ext4 (for instance). In a threaded model, to satisfy the weaker
> condition, a new thread could do the flush and reply. To satisfy
> the stronger condition, a new thread could do wait for any already queued
> writes to complete (allowing new ones into the queue), then flush and
> reply.

Right, that's what I meant.

[...]
> >> >     For a+b how does one report write errors that only appear after
> >> >     the reply? Report them in the next FLUSH request?
> >>
> >> You don't. To be safe, I'd error every write (i.e. turn the medium
> >> read only).
> >
> > I don't think errors that appear after the reply are possible in the
> > case of b (they are in the case of a, obviously)? Or what am I missing?
> 
> I think Goswin is asking "assume no FUA. I get a write request, and
> it succeeds (perhaps because I write it to my own internal
> cache). I subsequently decide to write that cache to disk, (perhaps
> because I REQ_FLUSH, or perhaps decide to flush for my own reasons)
> but that errors, perhaps because of an underlying disk error. How
> do I report that error?"

Ah, right.

> >> > * NBD_CMD_DISC: Wait for all pending requests to finish, close socket
> >>
> >> You should reply to all pending requests prior to closing the socket
> >> I believe, mostly as it's polite. I believe the current client doesn't
> >> send a disconnect until all replies are in,
> >
> > I believe so too, yes.
> >
> > [...]
> >> and I also think the server may behave a little badly here.
> >
> > How so?
> 
> I /believe/ what happens is that the server just closes the socket and
> quits.

True.

> I think it does not wait until any unsent replies have been sent
> (i.e replies that have been generated and are queued in the socket
> layer), which may result in them being lost. This is just code reading
> rather than testing. It's not a problem with the current client
> because it waits for all replies to come in before sending
> NBD_CMD_DISC.

Hmm. I must admit I hadn't considered what would happen if you closed a
socket that still had unsent data in its buffers; I guess I just blindly
assumed it would flush its buffers, but that might have been wrong.

This is what posix has to say about the matter:

       If  fildes  refers  to  a  socket, close() shall cause the socket to be
       destroyed. If the socket  is  in  connection-mode,  and  the  SO_LINGER
       option  is set for the socket with non-zero linger time, and the socket
       has untransmitted data, then close() shall block for up to the  current
       linger interval until all data is transmitted.

(http://pubs.opengroup.org/onlinepubs/009695399/functions/close.html)

So, in other words, we should probably be setting the SO_LINGER sockopt
on our sockets. I'll look into that.

> >> >   Should this flush data before closing the socket? And if so what if
> >> >   there is an error on flush? I guess clients should send NBD_CMD_FLUSH
> >> >   prior to NBD_CMD_DISC if they care.
> >>
> >> No, you should not rely on this happening. Even umount of an ext2 volume
> >> will not send NBD_FLUSH where kernel, client, and server support it.
> >> You don't need to write it then and there (in fact there is no 'then
> >> and there' as an NBD_CMD_DISC has no reply),
> >
> > It does have one -- the FIN packet. But yeah, it's not an
> > application-layer reply, that much is true.
> 
> Which is mildly irritating, particularly if you want to make sure the
> client doesn't do other stuff until you've processed the disconnect.

I suppose we should return an error if a client tries to do anything
else after having sent NBD_CMD_DISC.

> >> >   What if there are more requests after this while waiting for pending
> >> >   requests to finish? Should they be ignored or return an error?
> >>
> >> I believe it is an, um, undocumented implicit assumption that no
> >> commands are sent after NBD_CMD_DISC is sent. The current server
> >> just closes the socket, which will probably result in an EPIPE
> >> upstream if the FIN packet gets back before these other commands
> >> are written.
> >
> > The client will flush its outgoing queue before sending a disconnect
> > request. Indeed, if it didn't do that, badness would ensue.
> 
> Oh sure. I think Goswin was asking "can it change its mind and send
> other stuff afterwards?" - the answer is no, NBD_CMD_DISC must be
> the last command sent.

Right.

> I might try to improve the documentation. Ideally the protocol should
> be written up like an IETF draft (which I might just do).

That might be interesting, but then any protocol extensions would
require IETF updates. I'm not sure that's ideal. Currently nbd is a
Linux-internal protocol, and we can change it at will; we'd lose that
ability to some extent. Whether that's a fatal problem is, of course, up
for discussion.

Having said that, I do agree that having a more formal description of
the protocol is a good thing; proto.txt is a good start, but it's not
fully there yet.

At any rate, if there's going to be an IETF draft, I'm going to be a
(co-)author ;-)

-- 
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a



Reply to: