Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
- To: Alex Bligh <alex@...872...>
- Cc: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, Jack Kara <jack@...1290...>, hare@...122..., Wouter Verhelst <w@...112...>, Paul Clements <paul.clements@...856...>, Wouter Verhelst <wouter@...825...>
- Subject: Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
- From: Jan Kara <jack@...1290...>
- Date: Mon, 18 Nov 2013 10:29:29 +0100
- Message-id: <20131118092929.GA3921@...1426...>
- In-reply-to: <C2E9E8AD-752C-4190-BD4F-45A9482FF400@...872...>
- References: <8bf7c5db475eefcf17976a36f892200d@...1427...> <20131112214632.GB31763@...1426...> <7c1b2ca40c3abfe805e9e944f21c7016@...1427...> <20131114075827.GA13554@...1426...> <5285D258.9040808@...112...> <CAECXXi6Vt5gAjv=qkrGzLG3iRjNmjYiYZd7+gCXK860a2tonKg@...18...> <52889084.2080700@...825...> <C2E9E8AD-752C-4190-BD4F-45A9482FF400@...872...>
On Sun 17-11-13 17:19:17, Alex Bligh wrote:
>
> On 17 Nov 2013, at 09:46, Wouter Verhelst wrote:
>
> >>
> >> In order for nbd to seamlessly handle this situation, we'd have to do a
> >> reconnect in-kernel
> >
> > This would be fairly complicated, since all the connection and
> > negotiation currently happens in userspace. I'm not sure I want to go
> > down that route.
> >
> >> (or have a callout to userland to reconnect)
> >
> > That sounds interesting, too. How would you do that?
> >
> >> and
> >> then we'd have to retry any I/Os that may have failed in the meantime
> >> (or just let them fail, but that probably is not as useful).
> >
>
> Would another option be as follows:
>
> 1. When persistency is required, a new persist flag is specified to
> the kernel by the client.
>
> 2. On a connection failure, if the persist flag is set, don't
> clear up and return with a specific error number. The fd is
> still open (as still owned by the process), but (by assumption)
> unusable.
>
> 3. In persist mode, The block device only gets torn down when
> the fd closes / userland process terminates (whichever is
> easier, detection method TBD). Until then all writes block.
>
> 4. A newer nbd client detects the errno in persist mode, opens another
> fd, and calls the NBD_DOIT ioctl passing the old fd as an
> additional parameter (or does a new ioctl first to associate
> the new fd with the old fd). A new kernel then detects this,
> closes the old fd, and 'takes over' the existing block device
> with the new fd.
>
> On an old client, the kernel behaviour is thus unchanged. Similarly
> if persist is not required. If a new client in persist mode crashes
> after step (2), then the block device will still be torn down when
> the process exits.
Just to make it clear, my only comment was that tearing blockdev down
with kill_bdev() is the wrong way to do it (at least from filesystem POV).
NBD should rather put bdev into a state where it returns EIO for anything
you try to do with it after a network failure.
If you want some kind of persistency over network failures, you can queue
IO and attempt a reconnect - that really heavily reminds me the situation
dm-multipath solves for traditional fiberchannel multipathing so it might
be easiest stack dm-multipath over NBD and hack around multipath daemon to
understand specific needs of NBD and instead of switching to a different
fiberchannel path it would try to reconnect the network connection. Adding
Hannes to CC, maybe he will know why that would be a bad idea :).
Honza
--
Jan Kara <jack@...1290...>
SUSE Labs, CR
Reply to: