Re: [Nbd] libnbd

To: Wouter Verhelst <w@...112...>
Cc: nbd-general@lists.sourceforge.net
Subject: Re: [Nbd] libnbd
From: Goswin von Brederlow <goswin-v-b@...186...>
Date: Mon, 29 Apr 2013 11:45:59 +0200
Message-id: <20130429094559.GC6252@...1266...>
In-reply-to: <51795C39.80103@...112...>
References: <516FBEB3.3070806@...112...> <20130425121417.GB15210@...1266...> <517926C9.4000804@...112...> <20130425144930.GC15210@...1266...> <51795C39.80103@...112...>
On Thu, Apr 25, 2013 at 06:39:21PM +0200, Wouter Verhelst wrote:
> Hi,
> 
> On 25-04-13 16:49, Goswin von Brederlow wrote:
> > On Thu, Apr 25, 2013 at 02:51:21PM +0200, Wouter Verhelst wrote:
> >> Hi Goswin,
> [...]
> >>>> I'm undecided whether the state machine should be primarily linked to
> >>>> the socket or primarily linked to a request. In the former case,
> >>>> select() would just need to ensure that a state machine is moved from
> >>>> the "waiting for data"  to the "ready to read" state (or not touched if
> >>>> it isn't in the "waiting for data" state), which would be fairly easy to
> >>>> implement and shouldn't have a lot of performance issues, but would make
> >>>> handling requests in parallel fairly complicated. In the latter case,
> >>>> handling requests in parallel should be fairly trivial (we just read
> >>>> requests from a socket and create a new state machine instance), but
> >>>> doing so quickly might be an issue.
> >>>
> >>> Maybe you need more than one. I would think there should be 2 "state
> >>> machines" for server and 2 for clients:
> >>
> >> I should note that while I'm not opposed, in principle, to implementing
> >> the client side in this library, it's not where my focus will lie, at
> >> least not initially. After all, most useful nbd client implementations
> >> will just use the kernel and don't need a state machine of their own
> >> (though I'm sure there will be exceptions).
> > 
> > Isn't the initial negotiation done solely in the userspace client and
> > only the requests in kernel? I think at least that part should be in a
> > lib.
> 
> No. nbd-client is, and will always be, a very small footprint program,
> since it needs to be able to run from initrd and similar things. I
> intend to use libglib's helper functions extensively inside libnbd; that
> would incur a massive footprint increase. That simply is Not An Option(TM).
> 
> (I might add much of what nbd-client does as separate calls in libnbd
> for the benefit of others, but again, that won't be where my initial
> focus would lie)

There could be a small footprint libnbd-client.so.

Personally i don't really care about the size of the executable +
libraries. That is just mmaped. Parts that are not used don't take up
memory. Any increase in memory will be temporary only till it starts
the kernel part too.

> > If the server uses a state machine then the alloc_request() callback
> > would create a new one. It could be declared as
> > 
> > struct my_state_machine {
> >   struct nbd_request; // must be first
> >   int my_state;
> >   void * foo;
> > };
> > 
> > The libnbd would then will in the struct nbd_request and eventually
> > call handle_request(). That then starts the state machine.
> 
> I'm not sure I understand what you're on about here. Can you elaborate?

The interface for the main loop would require you to pass something to
handle requests. You would have for example these 3 callbacks:

struct request_ops {
  struct nbd_request * (*alloc_request)();
  void (*handle_request)(struct nbd_request *);
  void (*free_request)(struct nbd_request *);
};

The struct request_ops would be the visible interface between the main
loop and the request handling. People can implement their own or use
the one from libnbd. 

The libnbd internal implementation of the nbd request handling would
be a state machine that requires some extra data stored with each
request to track the state. So what you would do is declare a struct
my_state_machine that extends the struct nbd_request. In OO terms make
a derived class with struct request_ops being the virtual function
table.
 
> > libnbd could provide a default server implementation that is state
> > machine based like that.
> > 
> > The idea should be that libnbd provides a number of parts that you can
> > put together to make an NBD server.
> 
> Yes, that's the idea.
> 
> > And you can replace some of the
> > parts with your own implementation. So that one can easily extend or
> > replace existing parts.
> > 
> > My idea would be more obejct oriented. If you don't want the default
> > write function you supply your own. If you want to extend it your own
> > implementation can call the original function inbetween doing its own
> > thing.
> > 
> > Your idea seem to be that you reprogramm the state machine to include
> > new states. Maybe it is just a different way of saying the same.
> 
> Probably :-)
> 
> [...]
> > I just wouldn't require that requests are handled as state machine.
> > The interface between receiving and sending requests and handling them
> > don't interact much. It's just create, handle, reply, free. 4 little
> > functions that will be called in that order.
> 
> Well, here's the thing: "handle" in current nbd-server consists of:
> - Check the type of the request (read/write/trim/flush/disconnect)
> - read data from the socket, if this is a write request

That is part of parsing the protocol and as such part of the select
loop.

> - figure out if we're using copy-on-write; if so, redirect the relevant
> parts of the request to the diff file
> - figure out if we're using the multiple file option; if so, distribute
> the request to the relevant files from our "multiple" export.
> - Read data from the correct file into a buffer

That is part of the request handling.
 
> I suppose the "write data to socket" part would go into the "reply" above.

Yes.
 
> That's a lot of stuff to do for a "little" function. The current code
> just has the function which implements the first bullet point in the
> above list call the next function. That works, I suppose, but it's
> extremely non-flexible.
> 
> If we're going to allow people to replace (or disable?) some of that
> functionality, they can't be directly calling eachother anymore. I think
> a state machine there is the most flexible way to make that possible
> (although I could certainly be wrong about that).

The "little" function handle_request() would be, in your case, start a
complex state machine. The implementation of the state machine can
have before/after hooks and replacable states and such. That is all fine.

But I would like to see a clear seperation of the select loop and
protocol parsing and handling requests. As I see it most users of
libnbd will keep the protocol parsing but will replace the request
handling. Even if you add some new request type to the procotol that
should probably be done through some hook to extend the protocol
rather than replacing that part of the lib completly. The request
handling though is where the different nbd servers truely differ.

> > So I don't think it would
> > be hard to make this flexible enough to allow any kind of
> > implementation of the handling.
> 
> Well, that's the goal -- the big question is how to do that :-)

I think you are on the right track with having extensible state
machines already. But you haven't considered cleanly seperating the
initial negotiation, select loop / protocol parsing and the request
handling parts enough.

> >>> -----------------------------------------------------------------------
> >>>
> >>> - connecter
> >>>
> >>> This is the other side for the server / listener and handles feature
> >>> negotiation with the server. After negotiation a client "state
> >>> machine" is created.
> >>>
> >>> - client
> >>>
> >>> This is handles sending requests to the server and recieving replies
> >>> for a running client connection. A client has a queue for requests and
> >>> callbacks for events (request finsihed / failed).
> >>>
> >>> -----------------------------------------------------------------------
> >>>
> >>> I'm not sure a state machine is the right thing for device / client
> >>> for the user.
> >>
> >> I'm not saying I will certainly expose the state machine. But to do
> >> things right, I'll need some better way of keeping state than what we're
> >> doing now, and doing that right requires some design. That's what I'm
> >> trying to do here :-)
> >>
> >> A proper state machine should give us the required flexibility to allow
> >> calls that would do something along the lines of "call this function
> >> after reading from the socket but before writing to disk, so I can
> >> implement my own copy-on-write scheme" or some such. Such a call should
> >> indeed not need to know about the internals of the whole system, it
> >> should just be something along the lines of a
> >> "nbd_add_hook_before(handle, NBD_WRITE, funcptr)", where handle could be
> >> something that points to a specific export, the system as a whole, or
> >> something in between.
> > 
> > Or
> > 
> > struct nbd_request_ops ops = {
> >   .data_read = nbd_default_ops.data_read;
> >   .data_write = cow_write;
> 
> That can't work. The "default" read would read from the original file,
> the write would write to the diff file. The client will now ask for its
> data, and get the original data back.

Maybe I've been thinking of LVM snapshots too much. There the cow
saves the original data before alowing the original device to be
altered. The snapshot remains in the original state while the file
changes. (ok the snapshot can be written to too but that wasn't the
idea for the above).

> >   ...
> > };
> > 
> > void cow_write(...) {
> >   if (need_cow) {
> >     // do COW
> >   }
> >   nbd_default_ops.data_write(...);
> > }
> 
> So now we're back to "first function calls the next". Makes it pretty
> inflexible.

That isn't calling the next, that just uses the default function after
having dealt with the cow part.

But you are probably right about being inflexible. The way requests
are actually handled in a server will differ greatly. A simple server
that handles one request at a time in order will be straight forward.
Doing requests in parallel is still simple as each would be a seperate
instance of the state machine. But adding async IO changes the
structure completly since now the read_data() function will return
without having any data read and the main loop has to wait for
the async IO to complete.

You probably need 2 different interfaces or state machines to mode
sync and async IO. They might not be representable nicely with just
one.

For a more extreme example in my case (MAID - Massive Array of
Independen Disks [with redundancy]) I have a state machine that has
easily over 100 states to deal with IO being async, write journaling,
redundancy calculations, checksumming, error correction and so on.
There can also be background jobs that do scrubbing, resyncing and
reshaping that need to interact with normal operations. I'm pretty
certain for that I have to completly replace the request handling and
only keep the negotiation and protocol parsing.
 
> > I'm pretty sure hook_before/after isn't enough. A
> > nbd_replace_hook(handle, NBD_WRITE, funcptr) would be needed too.
> 
> The idea would be that the COW function wouldn't do the actual reading
> or writing; it would only change data in the request so that the next
> function, which does the actual writing, redirects its writes to the
> copy-on-write file. Additionally, it would change data so that
> subsequent reads would *get* the data from the copy-on-write file if
> we're trying to read from a region that was written to before in this
> session, but get it from the original file if not.

That sounds a bit like a device mapper level inbetween the state
machine and the code that actualy does read/write.
 
MfG
	Goswin
Reply to:
Follow-Ups:
- Re: [Nbd] libnbd
  - From: Wouter Verhelst <w@...112...>
References:
- [Nbd] libnbd
  - From: Wouter Verhelst <w@...112...>
- Re: [Nbd] libnbd
  - From: Goswin von Brederlow <goswin-v-b@...186...>
- Re: [Nbd] libnbd
  - From: Wouter Verhelst <w@...112...>
- Re: [Nbd] libnbd
  - From: Goswin von Brederlow <goswin-v-b@...186...>
- Re: [Nbd] libnbd
  - From: Wouter Verhelst <w@...112...>
Prev by Date: Re: [Nbd] libnbd
Next by Date: Re: [Nbd] libnbd
Previous by thread: Re: [Nbd] libnbd
Next by thread: Re: [Nbd] libnbd
Index(es):
- Date
- Thread