[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] tech documentation



On Thu, Oct 19, 2006 at 03:19:36PM +0200, Wouter Verhelst wrote:

| > It _MAY_ be of benefit to NBD (one of the reasons I wanted to know
| > more about the protocol).  It can allow multiple requests that are
| > order independent to be queued from the client to the server and not
| > slow down all of them if one packet is lost, while the lost one is
| > being recovered by the peer stacks.  Order is retained within a
| > stream, but is not enforced between them.
| 
| Yes, that could be a benefit, though I'm not quite sure it would in fact
| have a positive effect NBD througput speed: most NBD client requests are
| to read or write 1024 bytes (which is not enforced anywhere, it's just
| the default); there would probably be some additional overhead in
| creating a new SCTP stream for each of those requests, I guess? That
| overhead might outweigh the benefit you get from not having to stop the
| TCP stream cold every time a packet gets lost. You could of course
| increase the number of requests per stream, but that would require quite
| some additional bookkeeping (not sure that's worth it), or increase the
| request size (but then you increase the average amount of useless data
| that gets pumped over the network, not very good for througput).

Looks like the first request in a new connection I just made was for 16384
bytes.  Well, maybe not the request, but that much was sent in the reply.

WRT to your dissector request, I already have a tool that could be extended
to do that.  It is a TCP relayer daemon.  I use it for a lot of things.  It
has a capture option.  It captures raw.  There is a capture formatter that
just gives traffic direction, length, and content, formatted for visual
display (characters, escape codes, octal bytes).  I could make a capture
formatter that understand NBD.

Anyway, playing around with it, I see the following:

root@...102...:/root 325> tcprelayout -et < /tmp/capture-nbd/2006/10/19/155902-839113-12317 | cut -c 1-80
1161273542.852094 Accept from [169.254.38.8]:3843
1161273542.853518 Connect to  [127.0.0.1]:9000
1161273542.853803 < Recv  152 "NBDMAGIC\0\0B^b\201\206^rS\0\0\0^r\241\361`\0\0\0
1161273623.350142 Send >   28 "%`\225^s\0\0\0\0\360}\224\301\220;\343\327\0\0\0\
1161273623.370969 < Recv 16384 "gDf\230\0\0\0\0\360}\224\301\220;\343\327\372\35
1161273623.372021 < Recv   16 "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
1161274008.813483 Send >   28 "%`\225^s\0\0\0\0^d<\307\306\220;\343\327\0\0\0\0\
1161274008.813861 < Recv 4112 "gDf\230\0\0\0\0^d<\307\306\220;\343\327\0\0\0\0\0
1161275686.936456 Send >   28 "%`\225^s\0\0\0^bL\276^t\304(H\361\277\0\0\0\0\0\0
1161275686.937025 < Recv  EOF
1161275686.938856 Send >  EOF
1161275686.938936 END
root@...102...:/root 326>

The relayer is running on the server host in this case, so it connects to the
localhost at 127.0.0.1.  The first activity was the result of doing this
command on the client host:

dd if=/dev/nb0 of=/dev/null bs=512 count=1

The idea was to see how much would be requested with a minimal request from
the userland process that opened /dev/nb0 directly.  It seems to have gotten
16384+16 bytes.  Since I see only one request, I assume either that much was
asked for by the client, or the server just decided to reply with it anyway.
In either case, it seems the server has a maximum buffer size of 16384 for
its writes as the "lo" interface can handle even more.  The next command I
did was:

dd if=/dev/nb0 of=/dev/null skip=32 bs=512 count=1

and that got what appears to be 16 bytes of header and 4096 bytes of data.
Thereafter, I disconnected the device.

Obviously, there is no security in this.  The administrator is clearly
responsible for confining the connection to a safe and trusted network.


| > FYI, I am planning to look into implementing an NBD client directly into
| > the QEMU emulator in system mode.  This would allow an emulated system
| > to have one or more block devices mapped onto the network block devices
| > served from nbd-server anywhere without using NBD in the host kernel.
| > And the nbd-server can even be on the same machine running QEMU without
| > the deadlock issues (nbd-server runing in the guest OS under QEMU would,
| > of course, still have those issues).  This is another reason I wanted to
| > see technical protocol details.
| 
| Right. I hope the blog post I pointed you to is sufficient; there really
| isn't much else.

It does seem quite simple.  I'll try first to make a dissector in the form
of a formatter for my tcprelay program, and use that to work up understanding
what all is happening.

What form of dissector would you have wanted:

1.  A modified nbd-server that stays in foreground and outputs all the
    data to stdout (a debug extra verbose option)?

2.  A program that dissects from strace output of nbd-server activity?

3.  A program that works with libpcap to capture packets, reconstruct
    the TCP stream, and dissect whatever looks like NBD?

4.  Something to intercept the connection like my tcprelay program does,
    but stay in foreground and output dissected/formatted info?

5.  Something to dissect/format captured data from a tcprelay intercept
    program?

A disadvantage of an intercept program is that the program has to be
started before setting up the device connection, and the connection has
to be taken down to stop capturing.

A disadvantage of a packet capture (e.g. via libpcap or the like) is
that the implementation has to understand IP and TCP enough to put the
corrected octet stream back together before any NBD dissecting can even
take place.

A disadvantage of a modified server is you can only do the monitoring
and dissecting on the server host.

Two disadvantages of an strace on the server is that it also must be
on the server and it has to unformat the strace output.

BTW, I have been considering writing my tcprelay program to actually do
the formatting in real time and write formatted to the capture file.
Maybe I can toss in an "--nbd" option to have it format/dissect for NBD.

Current source code for tcprelay is in my LIBH package:

http://libh.slashusr.org/source/misc/src/sbin/tcprelay/tcprelay.c
http://libh.slashusr.org/source/misc/src/bin/tcprelayout/tcprelayout.c

-- 
-----------------------------------------------------------------------------
| Phil Howard KA9WGN       | http://linuxhomepage.com/      http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------



Reply to: