[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] how does NBD handle disconnections?



On Tue, Jul 31, 2007 at 12:17:24PM -0400, Francis Giraldeau wrote:
> Wouter Verhelst a écrit :
> > On Mon, Jul 30, 2007 at 10:28:58PM -0400, Francis Giraldeau wrote:
> >   
> >> Hi,
> >>
> >> I would like to use the -persist option of NBD. I thought that using
> >> this option would make nbd-client reconnect if the corresponding
> >> nbd-server is killed, but it didn't.
> >
> > Never the less, that is the behaviour that nbd-client tries to achieve.
> >
> > How is it failing? Could you provide a bit more detail?
> >
> As you may know, LTSP is using NBD to replace NFS root. I'm concerned
> about high availability. It was easy to setup high available NFS server
> with heartbeat. The problem with NBD is that one process is started for
> each connexion on one server. If a failover occur, the client must start
> a new process on the other server. The client should tolerate a
> disconnection and create a new one until it succeed.

Yes, I agree. This isn't happening now, because the kernel doesn't block
access to the blockdevice when a connection drops, instead returning
read or write errors immediately; so there's a race, which will almost
always trigger. However, if you do use -persist, the client will almost
immediately reconnect; if no access is in between, you won't see it.

I talked to the kernel nbd maintainer a while back, and we agreed that
he'd change the module at some point so that it'd block read or write
access to an nbd device until the process that set up the connection in
the first place exits. It could then do whatever it wants, including
reconnecting, or trying a different server if more have been specified
to it. AFAIK, those changes haven't happened yet; I'm sure patches are
welcome.

Mean time, the best solution to this problem is "don't restart the
server". Yes, that sucks majorly.

[...]
-- 
<Lo-lan-do> Home is where you have to wash the dishes.
  -- #debian-devel, Freenode, 2004-09-22



Reply to: