[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] Yet another NBD server out there



On Sun, Mar 10, 2013 at 04:02:30PM +0100, Wouter Verhelst wrote:
> On Sat, Mar 09, 2013 at 11:36:22PM +0100, Michal Belczyk wrote:
> > On Sat, Mar 09, 2013 at 02:17:19PM +0100, folkert wrote:
> > > It also would just disappear without any messages in syslog.
> > 
> > Now this looks like a bug to me -- could you please file an issue for it
> > on the project's bitbucket site and provide more details?
> > Much appreciated!
> > I think we should discuss this in private as this mailing list is
> > definitely not the right place to solve any bnbd-server issues...
> 
> why not?

Oh come on, bnbd project's bugtracker is the right place to submit bugs.
It is a very fresh project (to me personally it was just a kick-off),
bugs are expected and welcome, and I will not spam your mailing list...
It is obvious I was not able to test it in all possible environments
myself -- there was no official release, the only thing that has
happened is that the code was publicly available!
Actually I have not expected that many questions from you guys and this
just reminds me that I really should work on some documentation...


> [...]
> > > > It is a true network _block_ device, not a network _memory_ device as it
> > > > does not take any advantage of the buffer cache on the data origin
> > > > server.
> > > 
> > > What's the advantage of that?
> > 
> > There are both advantages and disadvantages -- it all depends on the use
> > case and it all depends on what you expect from a block device...
> > I do not expect more than the underlying physical device's raw
> > performance,
> 
> I haven't looked at the code, but can you explain how this differs from
> the "sync" option in nbd-server?

It differs, a lot.
What the well known nbd-server does is a loop of:

  1) read a request from the socket
  2) issue the request to the "disk"
  3) send it back to the client, goto 1)

The only thing that makes it work fast is that the "disk" is in fact a
buffer cache.
I do not like the idea of a block device server being in fact a
write-back cache for the underlying physical device, ignore my reasons
for it, perhaps I wanted to try something new and do some research on
the protocol itself, perhaps I was bored, it does not matter.

What bnbd-server does is also "a loop" of:

  1) read as many requests from the socket as possible
  2) submit as many AIO requests as possible to the disk via io_submit()
  3) collect as many AIO replies as possible via io_getevents()
  4) sent as many replies as possible via writev()

Many threads are involved although it is all doable in a single thread.
It appears that the NBD driver does a really good job when it comes to
batching requests...

The bnbd-server has 3 possible 'sync' settings:

  1) sync = 0 -- open the underlying volumes with O_DIRECT only, this is
the default.  The only 'async' behaviour which may happen here is the
sparse file's metadata updates which are async -- this is only relevant
when it comes to allocating new blocks, so if you want to play
super-safe, then dd your volume first!

  2) sync = 1 -- open the underlying volumes with O_DIRECT and O_SYNC,
pretty obvious.

  3) sync = 2 -- open the underlying volumes with O_DIRECT only, but
issue an fsync() or fdatasync() before sending the batch of replies
containing at least one write request.

  4) it is possible that once I add allocated blocks index (required to
make mirrors resynch process robust) there will be another value for
sync option which would trigger an fsync() or fdatasync() call before
sending back a batch of replies containing at least one write request
which had caused the block to be allocated... not sure about it at this
point.

I ran some benchmarks on my old SSD, see yourself what the differences
are:

  http://belo.io/misc/patriot-deadline-ext4/sequential.html

Yes, I am planning to add some trivial caching to improve the read-path,
but it is not on the top of my TODO list...


> > I do expect that buffer cache flushes on the client side
> > actually hit at least the underlying device's write cache (nearly sync
> > writes on the server side),
> 
> With the FUA patch (which finally has been accepted into Linux proper),
> this will happen; when the client side does an explicit buffer cache
> flush, this will be finished with an FUA call, which will cause an
> fsync() on the server side.
> 
> The effect will be the same, except that the system will be better,
> performance wise.

We shall see...
Still, what about e.g. AIO on top of NBD devices?
I thought the FUA was dropped, FLUSH was introduced, is that what you
meant?


> > specifically when client-side applications call fsync(), and I do
> > expect the server not to block for a long time when the server-side VM
> > will block it while flushing data to disk, causing clients to time
> > out...
> 
> Have you ever seen that to be a problem in practice?

Yeah, I'm pretty sure I have, but I gave up on nbd-server testing months
ago...


> [...]
> > I have never claimed that bnbd-server is a replacement for the original
> > nbd-server -- they are both two different beasts... and mine is just a
> > different approach to the whole NBD thing...
> 
> Not really, in my opinion.

Why don't you read the code first? ;-)
I have read yours...


-- 
Michal Belczyk Sr.



Reply to: