[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] libnbd



On Thu, Apr 25, 2013 at 02:51:21PM +0200, Wouter Verhelst wrote:
> Hi Goswin,
> 
> On 25-04-13 14:14, Goswin von Brederlow wrote:
> > On Thu, Apr 18, 2013 at 11:36:51AM +0200, Wouter Verhelst wrote:
> >> So, it's probably time I start thinking about how to implement that.
> > 
> > +1.
> 
> :-)
> 
> [...we need a state machine...]
> >> I'm undecided whether the state machine should be primarily linked to
> >> the socket or primarily linked to a request. In the former case,
> >> select() would just need to ensure that a state machine is moved from
> >> the "waiting for data"  to the "ready to read" state (or not touched if
> >> it isn't in the "waiting for data" state), which would be fairly easy to
> >> implement and shouldn't have a lot of performance issues, but would make
> >> handling requests in parallel fairly complicated. In the latter case,
> >> handling requests in parallel should be fairly trivial (we just read
> >> requests from a socket and create a new state machine instance), but
> >> doing so quickly might be an issue.
> > 
> > Maybe you need more than one. I would think there should be 2 "state
> > machines" for server and 2 for clients:
> 
> I should note that while I'm not opposed, in principle, to implementing
> the client side in this library, it's not where my focus will lie, at
> least not initially. After all, most useful nbd client implementations
> will just use the kernel and don't need a state machine of their own
> (though I'm sure there will be exceptions).

Isn't the initial negotiation done solely in the userspace client and
only the requests in kernel? I think at least that part should be in a
lib.

> > -----------------------------------------------------------------------
> > 
> > - server / listener
> > 
> > This accepts new connects and handles feature negotiation with the client.
> > After negotiation a new device "state machine" is created.
> > 
> > - device
> > 
> > This handles the recieving and replying to requests and the state of a
> > running connection. A device has a queue of requests and functions to
> > handle them.
> 
> I think you misunderstand what I meant when I said "the socket" above.
> Yes, we'll need multiple state machines at any rate, one per client and
> one for the listening socket. But in that case, each state machine would
> be linked to one specific socket; the "server / listener" state machine
> that you're talking about above is just linked to the "main" server
> socket, whereas the "device" ones are linked to individual client's
> sockets. In theory, that could be implemented with two different state
> machine implementations, but I'm not sure we actually need that.

Ok.

And yes, the generic state machine implementation could be the same.
Just the data used to create the machine would differ.
 
> What I meant with a state machine linked to a request is that we'd have
> some piece of code which loops over select() calls, and reads in data
> from a socket. If that looks like a request, it creates a *new* state
> machine just for that request, ensures it has all the data that it needs
> (which may involve reading more from the socket than just the request
> header), and starts it. This state machine is then responsible for
> handling all the "figure out where data is", "read data from disk", and
> "write data back to socket" function calls. Doing so would make it much
> easier for us to handle requests in parallel (if we're not finished
> reading from a particular backend file yet, we can just go do something
> else in the mean time).

But wouldn't that all be the server implementation? I don't see where
that would interact with the libnbd at all. The libnbd hands the
server code the request after it was received. The server then does
whatever with the request and at some points returns it to the libnbd
for sending back. Any state changes in the request inbetween those two
would be by the server and for the server.

The server could use a state machine or threads or do one request at a
time or something else. I don't think the libnbd should dictate what
that is.

If the server uses a state machine then the alloc_request() callback
would create a new one. It could be declared as

struct my_state_machine {
  struct nbd_request; // must be first
  int my_state;
  void * foo;
};

The libnbd would then will in the struct nbd_request and eventually
call handle_request(). That then starts the state machine.

libnbd could provide a default server implementation that is state
machine based like that.

The idea should be that libnbd provides a number of parts that you can
put together to make an NBD server. And you can replace some of the
parts with your own implementation. So that one can easily extend or
replace existing parts.

My idea would be more obejct oriented. If you don't want the default
write function you supply your own. If you want to extend it your own
implementation can call the original function inbetween doing its own
thing.

Your idea seem to be that you reprogramm the state machine to include
new states. Maybe it is just a different way of saying the same.

> Come to think of it though, there's no reason why we can't do both: have
> a state machine per socket, *and* have a state machine per outstanding
> request. The (code invoked by the) socket state machine just reads data
> from the socket, creates a "handle this request" state machine, and
> forgets about it. When the "handle this request" state machine figures
> out what needs doing, it just creates a "write this reply" event and
> posts that to the socket state machine.
> 
> Does that make sense?

Yes. I think we are going in the same direction.

I just wouldn't require that requests are handled as state machine.
The interface between receiving and sending requests and handling them
don't interact much. It's just create, handle, reply, free. 4 little
functions that will be called in that order. So I don't think it would
be hard to make this flexible enough to allow any kind of
implementation of the handling.
 
> > -----------------------------------------------------------------------
> > 
> > - connecter
> > 
> > This is the other side for the server / listener and handles feature
> > negotiation with the server. After negotiation a client "state
> > machine" is created.
> > 
> > - client
> > 
> > This is handles sending requests to the server and recieving replies
> > for a running client connection. A client has a queue for requests and
> > callbacks for events (request finsihed / failed).
> > 
> > -----------------------------------------------------------------------
> > 
> > I'm not sure a state machine is the right thing for device / client
> > for the user.
> 
> I'm not saying I will certainly expose the state machine. But to do
> things right, I'll need some better way of keeping state than what we're
> doing now, and doing that right requires some design. That's what I'm
> trying to do here :-)
> 
> A proper state machine should give us the required flexibility to allow
> calls that would do something along the lines of "call this function
> after reading from the socket but before writing to disk, so I can
> implement my own copy-on-write scheme" or some such. Such a call should
> indeed not need to know about the internals of the whole system, it
> should just be something along the lines of a
> "nbd_add_hook_before(handle, NBD_WRITE, funcptr)", where handle could be
> something that points to a specific export, the system as a whole, or
> something in between.

Or

struct nbd_request_ops ops = {
  .data_read = nbd_default_ops.data_read;
  .data_write = cow_write;
  ...
};

void cow_write(...) {
  if (need_cow) {
    // do COW
  }
  nbd_default_ops.data_write(...);
}

I'm pretty sure hook_before/after isn't enough. A
nbd_replace_hook(handle, NBD_WRITE, funcptr) would be needed too.
 
> > Internally you have a state machine that switches
> > between recieving request header and data. And something that keeps
> > track of the progress of data being send. But that would just be
> > internall handling of non-blocking IO.
> > 
> > Externally, for the user, I would make them objects with callbacks and
> > default functions. For example the device would have callbacks for
> > read_request, read_payload, handle_request, ... The read_request
> > default would simply create a struct request and read it from the
> > device FD. On completion, if the request is a WRITE, the client then
> > goes into receiving data mode and calls device->read_payload() to add
> > the data to the request. When request and data has been received fully
> > device->handle_request() is called.
> 
> Yes, something like that is the intention -- though I don't think I'll
> migrate to C++ (C with function pointers works just as well, and doesn't
> have all those many pitfalls that C++ does have).

Yes, stick with C there. Interfacing c++ with other stuff only adds
problems.
 
> > There should be callbacks for alloc_request/free_request to allow for
> > extended structures for requests or for keeping a pool of
> > pre-allocated requests structures and data buffers.
> > 
> > Most callbacks would have default implementations. From those
> > mentioned so far only handle_request would be required from the user.
> > 
> > Note: maybe look at fuse for how this looks in practice.
> >  
> >> Negotiation would need to be pretty much rewritten. Negotiation needs to
> >> be pretty much rewritten regardless, so that's not really an issue. I'm
> >> thinking of:
> >> * Having a data structure in which the key is the NBD_OPT_* value that
> >> the client would send
> >> * The value in that data structure would contain a function pointer (for
> >> options that need to calculate something) or just some data to send back
> >> (for options that only affect the state machine later on)
> >> * The negotiate() function would then just do the initial negotiation
> >> (NBDMAGIC, flags, etc) and loop over option haggling with a hash table
> >> rather than a switch() statement.
> >>
> >> ...I think that pretty much covers it.
> >>
> >> Thoughts? Anything I missed?
> >>
> >> Thanks,
> > 
> > Receiving and sending requests should support using a plain memory
> > buffer, iovecs or a file descriptor (pipe for splicing) for the data
> > part.
> 
> My plan, for now, is to make the default implementation use
> writev()/readv() and friends, and have iovecs. However, the goal is for
> the API to be flexible enough to allow people to change that without
> having to reimplement *everything* (they'd just reimplement the "read"
> and "write" functions, everything else would remain the same).

+1.

MfG
	Goswin



Reply to: