[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: FYI: Go implementation of the NBD protocol

On 11/28/18 4:23 AM, Richard W.M. Jones wrote:
On Tue, Nov 27, 2018 at 04:22:02PM +0100, Axel Wagner wrote:
Hi Richard,

no, I have only tested against nbd-{client,server} and the Linux kernel
implementation. Compatibility simply hasn't been a huge priority for me :)

Personally, it seems more efficient to me to have one reference
implementation and testsuite to run against new implementations, than to
require each new implementation to build a new testsuite for each existing
one. For example, I don't know nbdkit at all and know very little about
qemu. The thought of having to figure out how to run a client/server of
each and actually observe the outcomes of a testsuite seems… dreadful.
Whereas if you'd give me a binary that I can just point at my server and it
gives me a list of protocol-violations, I'd be fine to fix them all.

I don't disagree but the chances of us having a reference
implementation which fully tests the protocol any time soon is slim.
In the meantime testing against lots of clients/servers is the best bet.

Agreed. qemu-nbd has a python script that simulates a server that is intentionally broken (early disconnects and/or intentionally wrong bytes) at strategic points during initial handshake and the first client request, in order to test client robustness against flaky servers (qemu.git/tests/qemu-iotests/nbd-fault-injector.py), but it does not have a client counterpart, and it is sadly out of date (doesn't know NBD_OPT_GO, for example).

In my experience, the most common server bugs are failure to implement NBD_OPT_ length handling correctly, both for known options (did you check for a client sending length when it shouldn't, and after reporting the error are you still in sync to continue reading the next option from the client) and for unknown options (clients will want to probe you for the support of extensions, and this probing MUST not kill the connection, whether or not the client sent a payload). I recall fixing bugs in that category in all three of qemu-nbd, nbd-server, and nbdkit ;) Most clients that can get into transmission phase tend to be well-behaved, so testing that a server is robust against an ill-behaved client is harder.

For reference here are the commands to test against qemu, qemu-nbd and

Also, I don't know if you've implemented TLS support yet, but that's another tricky thing to get right, and we can help you with command lines for the same three projects with TLS support.

And, in a quick read of your project's README, you mention that it is designed to make it easy to implement arbitrary block mode failures. The nbdkit implementation has a similar mode of operation already, and Rich even has a recent video he made with that in action (in his video, he is running 5 NBD disks coupled to a tcl visualization, to demonstrate graphically which portions of a disks the kernel is touching, and to show what happens during the hot-failover of a RAID5 setup when one of the devices starts giving errors). It might be interesting to compare designs.

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Reply to: