[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] nbd-client hangs on negotiation - possible NBD bug



On Tue, Jul 25, 2006 at 03:20:13PM -0400, Mike Snitzer wrote:
> On 7/24/06, Eric Tessler <maiden1134@...34...> wrote:
> > Hello,
> >
> > I have been using NBD for the last year and I finally managed to break it. I
> > hit a case where the nbd-client hangs during the negotiation phase with the
> > nbd-server. In order to get this, I executed the following steps:
> >
> > 1. nbd-server 63333 /tmp/vol
> > 2. nbd-client 192.168.123.203 63333 /dev/nbd1
> >      Negotiation: ..size = 409664KB
> >      bs=1024, sz=409664
> > 3. nbd-client 192.168.123.203 63333 /dev/nbd2
> >      Negotiation: ..size = 409664KB
> >      bs=1024, sz=409664
> > 4. nbd-client -d /dev/nbd2
> > 5. Repeat step #3 above and the nbd-client hangs with the following output:
> >      Negotiation:
> >
> > From this point onwards, I cannot connect to the NBD share on port 63333.

Are you exporting them read-only, or doing something similarly safe?
Writing to them in this same manner is particularly dangerous.


> > The interesting thing is that if I then kill the /dev/nbd1 device
> > (nbd-client -d /dev/nbd1) so there are no more clients using this
> > share, I can then execute steps #2 and #3 above and they both pass
> > without a problem.  I have tried this both locally on the same
> > machine and remotely, the nbd-client hang always occurs. It seems
> > like everything works fine until I disconnect a single client, then
> > I cannot re-connect using a new client.

That is pretty strange, indeed.

I haven't had much time lately to debug this, but I plan to. Some day.

> > I added some debug output to the nbd client and server, the client is
> > connecting to the server through a socket, but the server's call to accept
> > on the socket never returns (as it should when a new client connects to the
> > server). I suspect this may have something to do with improper cleanup of
> > the client socket when the previous client disconnected from the server.
> >
> > Note that if I connect to the share using a single client and disconnect
> > that client, I can reconnect to the share w/o a problem. The problem appears
> > only when I have 2 clients connected to the share at the same time and I
> > disconnect one of them.
> >
> > I am using NBD 2.7.3 with Fedora Core 3, 2.6.11.12 kernel.
> >
> > Does anyone know of a bug that has been fixed that resolves this problem? I
> > am not ready to upgrade to the latest NBD just yet, but I would like to know
> > if this is a known problem and has been fixed.

I've been maintaining a separate 2.7 branch for people like yourself, so
that you don't have to update to the latest and spiffiest Just Yet, if
you don't want to. Since it also contains a buffer overflow fix that was
reported on this mailinglist, I'd really suggest you try it (even if
that buffer overflow isn't too dangerous after all).

> Eric,
> 
> This negotiation problem is still present with the latest nbd release
> (2.8.5) BUT it is _not_ reproducible through the same sequence you
> indicated your 2.7.3 setup reliably hangs on.  I've yet to find a
> reliable reproducer for 2.8.5 negotiation hangs.  I was hopeful that
> the sequence you used would work to reproduce on RHEL4U3.
> 
> With 2.8.5, using RHEL4U3 (2.6.9 + redhatstuff) the negotiation
> problem occurs much more frequently than with a kernel.org 2.6.15+.
> On the surface, it definitely appears to be an nbd-server issue given
> that if you restart the nbd-server all is fine.  But given that a
> kernel change makes this issue less frequent it is far from clear what
> is going on.

There is some performance issue with I/O buffers on network sockets at
the nbd-server side filling up, due to the fact that the server is
trying to write to the socket while the client is doing the exact same
thing. It's fairly easy to reproduce with the nbd-tester-client that is
in trunk currently. This issue might be related.

If anyone feels like hunting this bastard down, be my guest ;-)

-- 
Fun will now commence
  -- Seven Of Nine, "Ashes to Ashes", stardate 53679.4



Reply to: