Re: [Nbd] libnbd

To: nbd-general@lists.sourceforge.net
Subject: Re: [Nbd] libnbd
From: Wouter Verhelst <w@...112...>
Date: Thu, 25 Apr 2013 14:51:21 +0200
Message-id: <517926C9.4000804@...112...>
In-reply-to: <20130425121417.GB15210@...1266...>
References: <516FBEB3.3070806@...112...> <20130425121417.GB15210@...1266...>
Hi Goswin,

On 25-04-13 14:14, Goswin von Brederlow wrote:
> On Thu, Apr 18, 2013 at 11:36:51AM +0200, Wouter Verhelst wrote:
>> So, it's probably time I start thinking about how to implement that.
> 
> +1.

:-)

[...we need a state machine...]
>> I'm undecided whether the state machine should be primarily linked to
>> the socket or primarily linked to a request. In the former case,
>> select() would just need to ensure that a state machine is moved from
>> the "waiting for data"  to the "ready to read" state (or not touched if
>> it isn't in the "waiting for data" state), which would be fairly easy to
>> implement and shouldn't have a lot of performance issues, but would make
>> handling requests in parallel fairly complicated. In the latter case,
>> handling requests in parallel should be fairly trivial (we just read
>> requests from a socket and create a new state machine instance), but
>> doing so quickly might be an issue.
> 
> Maybe you need more than one. I would think there should be 2 "state
> machines" for server and 2 for clients:

I should note that while I'm not opposed, in principle, to implementing
the client side in this library, it's not where my focus will lie, at
least not initially. After all, most useful nbd client implementations
will just use the kernel and don't need a state machine of their own
(though I'm sure there will be exceptions).

> -----------------------------------------------------------------------
> 
> - server / listener
> 
> This accepts new connects and handles feature negotiation with the client.
> After negotiation a new device "state machine" is created.
> 
> - device
> 
> This handles the recieving and replying to requests and the state of a
> running connection. A device has a queue of requests and functions to
> handle them.

I think you misunderstand what I meant when I said "the socket" above.
Yes, we'll need multiple state machines at any rate, one per client and
one for the listening socket. But in that case, each state machine would
be linked to one specific socket; the "server / listener" state machine
that you're talking about above is just linked to the "main" server
socket, whereas the "device" ones are linked to individual client's
sockets. In theory, that could be implemented with two different state
machine implementations, but I'm not sure we actually need that.

What I meant with a state machine linked to a request is that we'd have
some piece of code which loops over select() calls, and reads in data
from a socket. If that looks like a request, it creates a *new* state
machine just for that request, ensures it has all the data that it needs
(which may involve reading more from the socket than just the request
header), and starts it. This state machine is then responsible for
handling all the "figure out where data is", "read data from disk", and
"write data back to socket" function calls. Doing so would make it much
easier for us to handle requests in parallel (if we're not finished
reading from a particular backend file yet, we can just go do something
else in the mean time).

Come to think of it though, there's no reason why we can't do both: have
a state machine per socket, *and* have a state machine per outstanding
request. The (code invoked by the) socket state machine just reads data
from the socket, creates a "handle this request" state machine, and
forgets about it. When the "handle this request" state machine figures
out what needs doing, it just creates a "write this reply" event and
posts that to the socket state machine.

Does that make sense?

> -----------------------------------------------------------------------
> 
> - connecter
> 
> This is the other side for the server / listener and handles feature
> negotiation with the server. After negotiation a client "state
> machine" is created.
> 
> - client
> 
> This is handles sending requests to the server and recieving replies
> for a running client connection. A client has a queue for requests and
> callbacks for events (request finsihed / failed).
> 
> -----------------------------------------------------------------------
> 
> I'm not sure a state machine is the right thing for device / client
> for the user.

I'm not saying I will certainly expose the state machine. But to do
things right, I'll need some better way of keeping state than what we're
doing now, and doing that right requires some design. That's what I'm
trying to do here :-)

A proper state machine should give us the required flexibility to allow
calls that would do something along the lines of "call this function
after reading from the socket but before writing to disk, so I can
implement my own copy-on-write scheme" or some such. Such a call should
indeed not need to know about the internals of the whole system, it
should just be something along the lines of a
"nbd_add_hook_before(handle, NBD_WRITE, funcptr)", where handle could be
something that points to a specific export, the system as a whole, or
something in between.

> Internally you have a state machine that switches
> between recieving request header and data. And something that keeps
> track of the progress of data being send. But that would just be
> internall handling of non-blocking IO.
> 
> Externally, for the user, I would make them objects with callbacks and
> default functions. For example the device would have callbacks for
> read_request, read_payload, handle_request, ... The read_request
> default would simply create a struct request and read it from the
> device FD. On completion, if the request is a WRITE, the client then
> goes into receiving data mode and calls device->read_payload() to add
> the data to the request. When request and data has been received fully
> device->handle_request() is called.

Yes, something like that is the intention -- though I don't think I'll
migrate to C++ (C with function pointers works just as well, and doesn't
have all those many pitfalls that C++ does have).

> There should be callbacks for alloc_request/free_request to allow for
> extended structures for requests or for keeping a pool of
> pre-allocated requests structures and data buffers.
> 
> Most callbacks would have default implementations. From those
> mentioned so far only handle_request would be required from the user.
> 
> Note: maybe look at fuse for how this looks in practice.
>  
>> Negotiation would need to be pretty much rewritten. Negotiation needs to
>> be pretty much rewritten regardless, so that's not really an issue. I'm
>> thinking of:
>> * Having a data structure in which the key is the NBD_OPT_* value that
>> the client would send
>> * The value in that data structure would contain a function pointer (for
>> options that need to calculate something) or just some data to send back
>> (for options that only affect the state machine later on)
>> * The negotiate() function would then just do the initial negotiation
>> (NBDMAGIC, flags, etc) and loop over option haggling with a hash table
>> rather than a switch() statement.
>>
>> ...I think that pretty much covers it.
>>
>> Thoughts? Anything I missed?
>>
>> Thanks,
> 
> Receiving and sending requests should support using a plain memory
> buffer, iovecs or a file descriptor (pipe for splicing) for the data
> part.

My plan, for now, is to make the default implementation use
writev()/readv() and friends, and have iovecs. However, the goal is for
the API to be flexible enough to allow people to change that without
having to reimplement *everything* (they'd just reimplement the "read"
and "write" functions, everything else would remain the same).

-- 
Copyshops should do vouchers. So that next time some bureaucracy
requires you to mail a form in triplicate, you can mail it just once,
add a voucher, and save on postage.
Reply to:
Follow-Ups:
- Re: [Nbd] libnbd
  - From: Goswin von Brederlow <goswin-v-b@...186...>
References:
- [Nbd] libnbd
  - From: Wouter Verhelst <w@...112...>
- Re: [Nbd] libnbd
  - From: Goswin von Brederlow <goswin-v-b@...186...>
Prev by Date: Re: [Nbd] libnbd
Next by Date: Re: [Nbd] libnbd
Previous by thread: Re: [Nbd] libnbd
Next by thread: Re: [Nbd] libnbd
Index(es):
- Date
- Thread