[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] [PATCH/RFC 0/3] Introduce TLS on nbdserver



Wouter,

As promised, thoughts on the complicated bit.

On 11 Apr 2016, at 12:42, Wouter Verhelst <w@...112...> wrote:

> Here's those thoughts I was referring to:
> 
> Currently, nbd-server consists of a series of functions, where function
> X implements feature Y by checking the CLIENT struct to see if feature Y
> is enabled, if so doing whatever necessary mangling, and then calling
> the next function in the list, *hardcoded*.

So the main part of my confusion was that I was talking about the
problems bitbanging the network layer directly. You're talking about
the problems bitbanging the disk layer directly (and layering that);
that's very similar to qemu's approach without (as I said) the
scary coroutines - on which more later.

They share some things in common, but not everything. On the network
layer side you can't (in the general case) just call (e.g.) a read
function and expect things to happen and return when they've done.
For example, in the openssl case you can do a read (plaintext)
from the network, and to execute it, it may need to write to the network
(some kind of SSL renegotiation). This can end up with blocking and
deadlock issues. This is one reason for qemu's coroutines - nothing
ever blocks. This would be a very significant change (and to be honest
we might as well say 'heh let's give up and work on qemu-nbd as they've
done all the hard work).

But I don't think you need to address the network the same way. The
*only* piece of network layering I can think of is TLS, and GnuTLS
is much friendlier - you feed it either an FD which it in essence
takes over, bar select() if you are doing that, or you feed it
pointers to functions that have similar semantics to send() and
recv().

So I think the network layer could use a much more simple abstraction.
As everything is blocking the way we use it at the moment (and I don't
see a need to change this), we could just have read() and a write()
calls replaced by function pointers with the same semantics, but which
automatically retry on EINTR and read or write until completion (or
error).

And those could be wrapped in things which e.g. read an entire option
in, or read an entire command in.

> With such an infrastructure, adding some dlopen() infrastructure for
> extra modules to add more such methods would also be possible.

That's interesting.

> The rest of nbd-server would then just call the first function in the
> linked list, and expect that function to call the next one in the list
> (if necessary/relevant).
> 
> The reason I haven't done this yet is that I'm not sure what the syntax
> of that backend_method thing should look like, and that I haven't worked
> out the whole backend infrastructure yet. I want to make it flexible
> enough, eventually, that things like "export three files in multifile
> mode, doing copy-on-write on the center file but not on the other two"
> or some such is possible (although a first implementation would
> certainly not *need* to do that, and could equally well restrict itself
> to being feature-equivalent to older nbd-server implementations).

I'm guessing one of the challenges here (and I'm looking at something
similar with gonbdserver) is do you want to proactively optimise
for backends where parallelisation is helpful?

Imagine you are getting an NBD_CMD_WRITE to a Ceph backend. Let's
say the NBD_CMD_WRITE is very large. One approach would be to start
reading it, and *as it comes in* break it into chunks and send
each chunk to a separate thread. This require much more knowledge
in the network side. My answer to this is either "no, expect the
client to do this", and/or "let's have some other NBD_OPT to set
the (preferred?) maximum request size" so that the client can break
it up itself. Equally, imagine you get a read and are doing
a structured reply. Are you going to break that up into separate
parallel threads? I think in that case the answer is probably yes.

Another challenge in backend design is: are you going to assume
that the backends are threadsafe and block, or are you going to
adopt a coroutine approach where everything is non-blocking. I'd
prefer the former.

> Once such a backend_method exists, there's no reason why a
> linked list of function pointers for the socket side of the equation
> could not also exist. These could then deal with the layering of TLS if
> necessary, and do the necessary handling of close operations, etc,
> abstracting away all the nitty gritty details of handling both encrypted
> and not-encrypted connections in the same code.
> 
> I had been planning to use the TLS stuff to start work on this
> refactoring (at least for the socket side of things), and have some
> tentative work in that area done; but it's not nearly as far down as
> what you've just sent, so I'm happy to junk that and continue on from
> this patch series.

Right - now I get the link :-) I think the gain on the network
layer side is much smaller than on the backend side, because there's
precisely one thing I can think of to layer, and it's not particularly
easy to do.

-- 
Alex Bligh







Reply to: