Re: [Nbd] libnbd

To: nbd-general@lists.sourceforge.net
Subject: Re: [Nbd] libnbd
From: Goswin von Brederlow <goswin-v-b@...186...>
Date: Thu, 25 Apr 2013 14:14:17 +0200
Message-id: <20130425121417.GB15210@...1266...>
In-reply-to: <516FBEB3.3070806@...112...>
References: <516FBEB3.3070806@...112...>
On Thu, Apr 18, 2013 at 11:36:51AM +0200, Wouter Verhelst wrote:
> So, it's probably time I start thinking about how to implement that.

+1.
 
> Since I want to end up with a clean API, it's probably best to start
> from what an ideal API would look like, and then implement that, rather
> than to start from the current code and try to nudge that into a clean
> API. That would probably fail; and having a clean API to work towards
> would also encourage me to clean up the current code where necessary.
> 
> To get a clean API, it's probably a good idea to first figure out what
> we would like to allow library users to do. I'm envisioning several use
> cases:
> 
> * Replacing the backend: something like qemu-nbd or gznbd would be
> implemented by having libnbd do "almost everything", except that the
> actual reads and writes are performed by that application.
> * Extending the backend: you'd notify libnbd somehow that you support
> this extra option in the protocol (which can then be negotiated with the
> client) or in the config file, and that if that option is enabled, that
> this particular function needs to be called at a particular place during
> the handling of a request. We could use this to implement, say, the
> copy-on-write feature. This should be implemented carefully enough so
> that users can optionally choose to replace the copy-on-write
> implementation by something else; this could make sense for backends
> that natively support snapshots or similar features.
> * Extending the protocol: something like xnbd (which has an additional
> protocol message for synchronization with failover NBD servers) would
> notify libnbd that it supports an extra option (which can then be
> negotiated with with the client). If this option is enabled during
> negotiation and the client then sends a particular message type or a
> message with a particular flag, a particular function should be called
> to handle it.
> * Alternate implementation of particular bits of the library, for
> performance improvements. For instance, one might wish to replace the
> select() etc calls with things like libevent.
> * Alternate protocol handling. For instance, someone might wish to
> implement unix domain socket handling, rather than TCP sockets.
> * Embedding. You would use everything from libnbd, but the main loop
> would be implemented elsewhere since you have an application that just
> happens to be exporting something over NBD, but does a load of other
> things as well.
> 
> I believe that's about it.
> 
> To get at something like that, I think it's pretty obvious we'll need a
> state machine. This would need to have the following features:
> * A function pointer to be invoked when entering a given state.
> * A condition which would cause the state to be entered. This could be
> things like "socket X is ready to be read", "we have an outstanding
> request and flag Y was negotiated with the client", "we have an
> outstanding request and flag Z was enabled in configuration".
> * For reasons of performance, *most* lookups in the state machine would
> preferably not use a hash table but O(1) algorithms instead. E.g., we
> could have a hash table of possible states out of which an actual state
> machine is built at accept() time for a client socket, which then uses
> ->next_state pointers or some such.
> * We'd need some functions to create new states.
> * There should be some API to be able to explicitly set a particular
> state, or to set a particular flag in the state machine.
> * Some states may need the state machine to skip or ignore if conditions
> aren't satisfied (e.g., a copy-on-write state would need to be
> skipped/ignored if the option isn't enabled; or a "sync data for this
> request after the write" state would need to be skipped if the request
> we're handling doesn't have the FUA flag set) while others may need to
> the state machine to wait until all conditions are met (e.g., the "read
> data" state shouldn't be entered until the socket actually has data
> waiting). Maybe some states may need the state machine to wait in some
> cases but skip in others for one and the same state?
> 
> I'm undecided whether the state machine should be primarily linked to
> the socket or primarily linked to a request. In the former case,
> select() would just need to ensure that a state machine is moved from
> the "waiting for data"  to the "ready to read" state (or not touched if
> it isn't in the "waiting for data" state), which would be fairly easy to
> implement and shouldn't have a lot of performance issues, but would make
> handling requests in parallel fairly complicated. In the latter case,
> handling requests in parallel should be fairly trivial (we just read
> requests from a socket and create a new state machine instance), but
> doing so quickly might be an issue.

Maybe you need more than one. I would think there should be 2 "state
machines" for server and 2 for clients:

-----------------------------------------------------------------------

- server / listener

This accepts new connects and handles feature negotiation with the client.
After negotiation a new device "state machine" is created.

- device

This handles the recieving and replying to requests and the state of a
running connection. A device has a queue of requests and functions to
handle them.

-----------------------------------------------------------------------

- connecter

This is the other side for the server / listener and handles feature
negotiation with the server. After negotiation a client "state
machine" is created.

- client

This is handles sending requests to the server and recieving replies
for a running client connection. A client has a queue for requests and
callbacks for events (request finsihed / failed).

-----------------------------------------------------------------------

I'm not sure a state machine is the right thing for device / client
for the user. Internally you have a state machine that switches
between recieving request header and data. And something that keeps
track of the progress of data being send. But that would just be
internall handling of non-blocking IO.

Externally, for the user, I would make them objects with callbacks and
default functions. For example the device would have callbacks for
read_request, read_payload, handle_request, ... The read_request
default would simply create a struct request and read it from the
device FD. On completion, if the request is a WRITE, the client then
goes into receiving data mode and calls device->read_payload() to add
the data to the request. When request and data has been received fully
device->handle_request() is called.

There should be callbacks for alloc_request/free_request to allow for
extended structures for requests or for keeping a pool of
pre-allocated requests structures and data buffers.

Most callbacks would have default implementations. From those
mentioned so far only handle_request would be required from the user.

Note: maybe look at fuse for how this looks in practice.
 
> Negotiation would need to be pretty much rewritten. Negotiation needs to
> be pretty much rewritten regardless, so that's not really an issue. I'm
> thinking of:
> * Having a data structure in which the key is the NBD_OPT_* value that
> the client would send
> * The value in that data structure would contain a function pointer (for
> options that need to calculate something) or just some data to send back
> (for options that only affect the state machine later on)
> * The negotiate() function would then just do the initial negotiation
> (NBDMAGIC, flags, etc) and loop over option haggling with a hash table
> rather than a switch() statement.
> 
> ...I think that pretty much covers it.
> 
> Thoughts? Anything I missed?
> 
> Thanks,

Receiving and sending requests should support using a plain memory
buffer, iovecs or a file descriptor (pipe for splicing) for the data
part.

MfG
	Goswin
Reply to:
Follow-Ups:
- Re: [Nbd] libnbd
  - From: Wouter Verhelst <w@...112...>
- Re: [Nbd] libnbd
  - From: folkert <folkert@...421...>
References:
- [Nbd] libnbd
  - From: Wouter Verhelst <w@...112...>
Prev by Date: Re: [Nbd] [PATCH v2 0/9] fix max discard sectors limit
Next by Date: Re: [Nbd] libnbd
Previous by thread: [Nbd] libnbd
Next by thread: Re: [Nbd] libnbd
Index(es):
- Date
- Thread