[Nbd] transparent handling of nbd reconnections at kernel level
- To: nbd-general@lists.sourceforge.net
- Subject: [Nbd] transparent handling of nbd reconnections at kernel level
- From: Juan Antonio Martinez <jantonio@...1345...>
- Date: Mon, 20 May 2013 14:16:51 +0200
- Message-id: <1369052211.5975.24.camel@...1346...>
( This is my firt post to this list, excuses for my poor english and/or
improper/unkowun rules )
Hi, all
I'm a trying to create a NBD server cluster by mean of several servers
(same image on every one :-) and keepalived daemon for virtual server ip
and load balancing
Everything works fine: connection works, server failover works, load
balancing works.... but nbd-client fails on reconnect to new nbd server
I use Ubuntu-12.04 with latest Ubuntu kernel (3.8.21) and nbd server and
client from git (version 3.3).
Trying to isolate problem I've used this scenario:
* NBD server
* nbd client launched with cmdline:
jantonio$ sudo /sbin/nbd-client -N ltsp_i386 nbdserver /dev/nbd0 -t 6
-persist -nofork
* On the client
mount -r -t squashfs /dev/nbd0 /mnt/nbd
by issuing "service nbd-server stop" and then restart I can see that
nbd-client detect server fails and reconnect w/o problems:
........
jantonio@...1346...:~$ sudo /sbin/nbd-client -N ltsp_i386
binubuntu2 /dev/nbd0 -t 6 -persist -nofork
Negotiation: ..size = 5953MB
bs=1024, sz=6243049472 bytes
timeout=6
nbd,4768: Kernel call returned: 32 Reconnecting
Error: Socket failed: Connection refused
Exiting.
Reconnecting
[...]
Error: Socket failed: Connection refused
Exiting.
Reconnecting
Negotiation: ..size = 5953MB
bs=1024, sz=6243049472 bytes
timeout=6
...............
But mounted squashfs fails if during the reconnection process I try
to do perform any operation (eg: "ls /mnt/nbd").
I've tested also by mean of "dd" instead "mount": as soon as server
socket closes, dd aborts regardless of "conv=noerror" dd option, instead
of waitting for reconnect
In the first test, mounted filesystem becomes no longer usable; in the
later dd aborts, so cannot complete operation. In both cases, nbd-client
detect lose of connection and successfully reconnect, but kernel module
just closes and becomes no longer available if try to use it in the
reconnection meantime
So my question:
Is there any way to get nbd kernel module waiting for server client
execute finnish_sock() routine to tell the new socket to talk to,
instead of inmediate return of ioerror ?
Perhaps a new ioctl() or nbd_flag option to say kernel that client is in
"persist mode" and wait instead of return?
Thanks in advance
Juan Antonio
Reply to: