[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] found bug in nbd-server.c, handle_info()




On 5/22/2017 2:30 AM, Wouter Verhelst wrote:
On Fri, May 12, 2017 at 09:53:24AM -0400, Menke, Gregory D. (GSFC-582.0)[Arctic Slope Technical Services, Inc.] wrote:
Hi all,

I traced the issue some more, it is related to the client side- it
appears the client connection to the localhost end of the tunnel drops,
but if the tunnel is connected from a different computer on the local
subnet, and nbd-client sends its connection thru that, then nbp is stable.

So I'm pursing why nbp-client making a connection to a localhost tunnel
nbd, not nbp ;-)

hmm yes indeed :)




endpoint is fragile.  I'm going to try ssh tunnels on the local subnet
so they are fast, to see if the behavior is related to wan
latency/bandwidth or not.

In the circumstance of the localhost connection dropping it tends to
leave the nbp-client and mount point difficult to close, SIGKILL on the
entire stack of related software is sometimes unable to exit the
processes so things can be unwound.  When SIGKILL does work then use of
the nbp device can be recovered.  It has the appearance of deadlock in
the nbp kernel module.
Does this seem more likely to happen under memory stress? Are you
swapping to the device, or running programs from it?

If so, this might be related to what
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7f338fe4540b1d0600b02314c7d885fd358e9eca
fixed for direct NBD connections. There is little to nothing that can be
done about that.


I have not seen it obviously associated with memory pressure though I could have been missing such a factor. It is definitely associated with the nbd client connection to localhost. I was able to duplicate the behavior both on my slow wan ssh tunnel and a fast local network ssh tunnel; in both cases once the nbd client's connection is to another host on the network behavior is much better. Leaving nbd-client (and nbd-client -d) dead and unkillable with nbd kernel module impossible to unload remains the characteristic symptom. I'll have a try at adjusting min_free_kbytes as per the thread, see if the behavior changes.


Thanks!




Reply to: