Re: [Nbd] how does NBD handle disconnections?

To: Wouter Verhelst <w@...112...>
Cc: nbd-general@lists.sourceforge.net
Subject: Re: [Nbd] how does NBD handle disconnections?
From: Tomasz Chmielewski <mangoo@...119...>
Date: Thu, 17 May 2007 15:03:24 +0200
Message-id: <464C529C.6060806@...119...>
In-reply-to: <20070517094343.GL23713@...3...>
References: <464AF26C.60500@...119...> <20070517094343.GL23713@...3...>

Wouter Verhelst schrieb:

On Wed, May 16, 2007 at 02:00:44PM +0200, Tomasz Chmielewski wrote:
How does NBD handle disconnections?
Not very well.
For example, we have a nbd device mounted on a client, the clientwrites, and we kill a server for 15 minutes.
What will the client do?
Scream and yell. And die a horrible death.
Will the process accessing the nbd disk just freeze for 15 minutes(waiting for IO), and happily continue once nbd client can reconnect tothe server?
This is the behaviour that I would _want_ to occur;


Good at least someone thought about it.

nbd0: Attempted send on closed socket
end_request: I/O error, dev nbd0, sector 0

Proper reconnects aren't possible, either. The client now has a
'-persist' option which will prompt it to try to reconnect immediately
when the device is disconnected; but as the kernel doesn't lock the
device, that still gives you a (short) window during which I/O errors,
as above, may occur.

I discussed this with the NBD kernel module's maintainer, and the idea
is now that the NBD kernel device will be modified so that it will
indeed lock a device for as long as a nbd-client userspace process has
the device node open. This will remove the race described above, though
"killing the server for fifteen minutes" (I assume for a reboot or so?)
will still be problematic (but then at least fixable). However that
hasn't been done yet, AFAIK.


Why should 15 minutes of disconnection differ from 1 minute or 2 hours?

Right now, I'm using iSCSI for SAN (really many computers connected,both workstations and virtualized servers).

I noticed that such disconnections are:

- a single point of failure
- hard to recover when you're away
- can lead to data corruption

Normally, the default open-iscsi (iSCSI initiator/client) "disconnect"timeout is 2 minutes. I increased it with great success - I disconnecteda diskless workstation (disk accessible over iSCSI) for one day (!), andafter putting the network cable back, the tasks unfroze and machine wasusable again (no data loss, nothing).

Indeed, normally, you'd never want to disconnect for that long -usually, it'll be a server restart that takes a couple of minutes max.

But there are several more factors why I like having being able tosurvive for longer than just a couple of minutes.

Recently, I had a disk failure, which made the iSCSI server/targetunaccessible for a couple of hours. The disks are fine I suppose - it'sthe experimental sata_mv driver that makes trouble I suppose.Because of that failure, all virtualized servers had the disks remountedread-only and/or completly unaccessible, and needed a restart (I didn'thave "disconnect timeout" increased back then).

If it was able to survive longer, I would just restart the SAN, done, nodata corruption, nothing else to do.



That's why I'd like NBD to be able to handle disconnections better.

Obviously, if you're interested in working on that, you'd be welcome.
AAUI, this shouldn't be all that hard for someone knowledgeable about
kernel internals; but since I'm not such a person, well.

Sorry, kernel doesn't accept bash/perl/python modules yet, so I won'thelp much (I'm good at complaining, though) :)



--
Tomasz Chmielewski
http://wpkg.org

Reply to:

Follow-Ups:
- Re: [Nbd] how does NBD handle disconnections?
  - From: Wouter Verhelst <w@...112...>

References:
- [Nbd] how does NBD handle disconnections?
  - From: Tomasz Chmielewski <mangoo@...119...>
- Re: [Nbd] how does NBD handle disconnections?
  - From: Wouter Verhelst <w@...112...>

Prev by Date: Re: [Nbd] how does NBD handle disconnections?
Next by Date: Re: [Nbd] how does NBD handle disconnections?
Previous by thread: Re: [Nbd] how does NBD handle disconnections?
Next by thread: Re: [Nbd] how does NBD handle disconnections?
Index(es):
- Date
- Thread