[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] nbd-client started at boot (for root device) is not persistent



On Fri, Nov 25, 2016 at 12:44:45AM +0200, Victor wrote:
> Hi Wouter,
> 
> It is a ubuntu 16.04, with the following package version:
> ii  nbd-client                        
> 1:3.13-1                                             amd64        Network Block
> Device protocol - client
> 
> Initramfs is 0.122ubuntu8.5
> 
> I have removed the close(nbd), open(nbd) from the source code. It works ok now,

Good :)

> except for a single small issue: each disconnect/reconnect leaves back a
> nbd-client process behind. Here is how it looks after 3 such disconnects.
> 
> root       355  0.0  0.2   4372  2312 ?        SLs  00:30   0:00 @sbin/
> nbd-client 10.4.104.4 -N root /dev/nbd0 -swap -persist -systemd-mark
> root      1816  0.0  0.0   4372   344 ?        S    00:33   0:00 @sbin/
> nbd-client 10.4.104.4 -N root /dev/nbd0 -swap -persist -systemd-mark
> root      1842  0.0  0.0   4372   344 ?        S    00:34   0:00 @sbin/
> nbd-client 10.4.104.4 -N root /dev/nbd0 -swap -persist -systemd-mark
> root      1843  0.0  0.0      0     0 ?        S<   00:34   0:00 [nbd0]

Hrm.

> The tcp socket is still connected from the original process, pid 355:
> 
> root@...2778...:~# netstat -atnp | grep 10809
> tcp        0      0 10.4.104.5:58666        10.4.104.4:10809        ESTABLISHED
> 355/nbd-client 

Actually, it's reconnected, but okay :)

> stracing the processes 1816 and 1842 shows that both are doing a single thing
> continuously:
> 
> open("/sys/block/nbd0/pid", O_RDONLY)   = -1 ENOENT (No such file or directory)
> nanosleep({0, 100000000}, NULL)         = 0
> open("/sys/block/nbd0/pid", O_RDONLY)   = -1 ENOENT (No such file or directory)
> nanosleep({0, 100000000}, NULL)         = 0
> open("/sys/block/nbd0/pid", O_RDONLY)   = -1 ENOENT (No such file or directory)
> nanosleep({0, 100000000}, NULL)         = 0
> open("/sys/block/nbd0/pid", O_RDONLY)   = -1 ENOENT (No such file or directory)
> nanosleep({0, 100000000}, NULL)         = 0
> 
> which seems to be related to the following code:
> 
>                         while(check_conn(nbddev, 0)) {
>                                 nanosleep(&req, NULL);
>                         }

Ah, yes. Hrm. That's needed for triggering the partition table reread;
but the workaround we did earlier so that opening /dev/nbdX is no longer
necessary won't work -- the NBD device only triggers a partition table
reread upon an open()/close() of the device, but that's exactly what we
found doesn't work.

I suppose the best way to fix that, then, is to have check_conn() test
if the directory /sys/block/nbdX can be found; if not, we can't test
anything and we should probably just give up.

Not sure if that's really the best thing to do, though.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Reply to: