[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] found bug in nbd-server.c, handle_info()



On Fri, May 12, 2017 at 09:53:24AM -0400, Menke, Gregory D. (GSFC-582.0)[Arctic Slope Technical Services, Inc.] wrote:
> Hi all,
> 
> I traced the issue some more, it is related to the client side- it 
> appears the client connection to the localhost end of the tunnel drops, 
> but if the tunnel is connected from a different computer on the local 
> subnet, and nbd-client sends its connection thru that, then nbp is stable.
> 
> So I'm pursing why nbp-client making a connection to a localhost tunnel 

nbd, not nbp ;-)

> endpoint is fragile.  I'm going to try ssh tunnels on the local subnet 
> so they are fast, to see if the behavior is related to wan 
> latency/bandwidth or not.
> 
> In the circumstance of the localhost connection dropping it tends to 
> leave the nbp-client and mount point difficult to close, SIGKILL on the 
> entire stack of related software is sometimes unable to exit the 
> processes so things can be unwound.  When SIGKILL does work then use of 
> the nbp device can be recovered.  It has the appearance of deadlock in 
> the nbp kernel module.

Does this seem more likely to happen under memory stress? Are you
swapping to the device, or running programs from it?

If so, this might be related to what
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7f338fe4540b1d0600b02314c7d885fd358e9eca
fixed for direct NBD connections. There is little to nothing that can be
done about that.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Reply to: