On Fri, Nov 25, 2016 at 12:44:45AM +0200, Victor wrote:
> Hi Wouter,
>
> It is a ubuntu 16.04, with the following package version:
> ii nbd-clientGood :)
> 1:3.13-1amd64 Network Block
> Device protocol - client
>
> Initramfs is 0.122ubuntu8.5
>
> I have removed the close(nbd), open(nbd) from the source code. It works ok now,
> except for a single small issue: each disconnect/reconnect leaves back a
> nbd-client process behind. Here is how it looks after 3 such disconnects.
>
> root 355 0.0 0.2 4372 2312 ? SLs 00:30 0:00 @sbin/
> nbd-client 10.4.104.4 -N root /dev/nbd0 -swap -persist -systemd-mark
> root 1816 0.0 0.0 4372 344 ? S 00:33 0:00 @sbin/
> nbd-client 10.4.104.4 -N root /dev/nbd0 -swap -persist -systemd-mark
> root 1842 0.0 0.0 4372 344 ? S 00:34 0:00 @sbin/
> nbd-client 10.4.104.4 -N root /dev/nbd0 -swap -persist -systemd-mark
> root 1843 0.0 0.0 0 0 ? S< 00:34 0:00 [nbd0]
Hrm.
> The tcp socket is still connected from the original process, pid 355:
>
> root@...2778...:~# netstat -atnp | grep 10809
> tcp 0 0 10.4.104.5:58666 10.4.104.4:10809 ESTABLISHED
> 355/nbd-client
Actually, it's reconnected, but okay :)
> stracing the processes 1816 and 1842 shows that both are doing a single thing
> continuously:
>
> open("/sys/block/nbd0/pid", O_RDONLY) = -1 ENOENT (No such file or directory)
> nanosleep({0, 100000000}, NULL) = 0
> open("/sys/block/nbd0/pid", O_RDONLY) = -1 ENOENT (No such file or directory)
> nanosleep({0, 100000000}, NULL) = 0
> open("/sys/block/nbd0/pid", O_RDONLY) = -1 ENOENT (No such file or directory)
> nanosleep({0, 100000000}, NULL) = 0
> open("/sys/block/nbd0/pid", O_RDONLY) = -1 ENOENT (No such file or directory)
> nanosleep({0, 100000000}, NULL) = 0
>
> which seems to be related to the following code:
>
> while(check_conn(nbddev, 0)) {
>nanosleep(&req, NULL); Ah, yes. Hrm. That's needed for triggering the partition table reread;
> }
but the workaround we did earlier so that opening /dev/nbdX is no longer
necessary won't work -- the NBD device only triggers a partition table
reread upon an open()/close() of the device, but that's exactly what we
found doesn't work.
I suppose the best way to fix that, then, is to have check_conn() test
if the directory /sys/block/nbdX can be found; if not, we can't test
anything and we should probably just give up.
Not sure if that's really the best thing to do, though.
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12