[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Question about the expected behaviour of nbd-server for async ops



Wouter Verhelst <w@...112...> writes:

> I've been working on a multithreaded scatter/gather implementation of
> the backend; the current code is in a 'scatgat' branch (which needs
> updating for post-2.9.21 patches). It currently fails 'make check'
> though, so it's certainly not ready yet.
>
> That implementation, once complete, will read a request from the socket,
> check whether it is a read or write request and read the data in case of
> a write request, submit the request to a GThreadPool to read from or
> write to the backend storage, and use writev (which the man page tells
> me is supposed to be atomic) to write the entire reply to the socket at
> once (that is, the header plus the data in case of NBD_CMD_READ). So
> reads from the socket are done in the main thread, and writes are done
> from worker threads, and we don't need to use mutexes much, which is
> good.

Where do you still need mutexes?

> This would not care about ordering much, is a fairly simple
> implementation (once I get rid of the bugs), and should improve
> performance since reads and writes are no longer handled sequentially.

The other day on the train from work I put done some notes on how do to
nbd-server using splice for zero-copy operations with verry similar
results. Only difference is the writev part. The splice manpage says
nothing about splice() being atomic. So in pseudo code the server looks
like this:

Read thread:

1) allocate a struct job { struct request *req; int pipe[2]; int res; }
2) reads in a request from the socket into job structure
3) create pipe to hold data and add to job structure
4) Add job to IO queue
5) On WRITE splice request->len byte from socket to pipe
6) On DISC: wait for all IO Threads to quit, add job to reply queue, quit
7) goto 1

IO Threads:

1) Wait for job in IO queue and remove it from the queue
On READ:
2a) add job to reply queue
3a) splice request->len byte from disk to pipe
On WRITE:
2b) splice request->len bytes from pipe to disk
3b) add job to reply queue
On FLUSH/FUA:
2c) fsync()
3c) add job to reply queue
On DISC:
2d) add job back to IO queue /* make all IO threads die */
3d) quit
4) goto 1

Write thread:

1) Wait for job in reply queue and remove it from queue
2) write reply to socket
3) On READ splice request->len byte from pipe to socket
4) close pipe
5) free job
6) On DISC quit
7) goto 1


One sore point is that in IO Threads the READ request is put into the
reply queue before the data is read from disk. This potentially (and
probably often) then blocks the write thread. But if I reverse 2a/2b
then the splice may block forever.

Pipes and jobs could also be reused instead of creating/destroying them
all the time. A lockless queue could be used for this. Both the IO and
reply queue could be done in pipes or socketpair so no mutex would be
needed there. So I think this could work completly without mutexes.


For a multi disk server there would also be one master that listens on
the NBD port. On startup it would start the IO threads. On connect it
would start a read+write thread for the requested device and
NBD_CMD_DISC would skip the IO queue and go directly to the reply queue
so the IO threads don't die.

MfG
        Goswin



Reply to: