Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting

To: Alex Bligh <alex@...872...>
Cc: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, Jack Kara <jack@...1290...>, hare@...122..., Wouter Verhelst <w@...112...>, Paul Clements <paul.clements@...856...>, Wouter Verhelst <wouter@...825...>
Subject: Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
From: Jan Kara <jack@...1290...>
Date: Mon, 18 Nov 2013 10:29:29 +0100
Message-id: <20131118092929.GA3921@...1426...>
In-reply-to: <C2E9E8AD-752C-4190-BD4F-45A9482FF400@...872...>
References: <8bf7c5db475eefcf17976a36f892200d@...1427...> <20131112214632.GB31763@...1426...> <7c1b2ca40c3abfe805e9e944f21c7016@...1427...> <20131114075827.GA13554@...1426...> <5285D258.9040808@...112...> <CAECXXi6Vt5gAjv=qkrGzLG3iRjNmjYiYZd7+gCXK860a2tonKg@...18...> <52889084.2080700@...825...> <C2E9E8AD-752C-4190-BD4F-45A9482FF400@...872...>

On Sun 17-11-13 17:19:17, Alex Bligh wrote:
> 
> On 17 Nov 2013, at 09:46, Wouter Verhelst wrote:
> 
> >> 
> >> In order for nbd to seamlessly handle this situation, we'd have to do a
> >> reconnect in-kernel
> > 
> > This would be fairly complicated, since all the connection and
> > negotiation currently happens in userspace. I'm not sure I want to go
> > down that route.
> > 
> >> (or have a callout to userland to reconnect)
> > 
> > That sounds interesting, too. How would you do that?
> > 
> >> and
> >> then we'd have to retry any I/Os that may have failed in the meantime
> >> (or just let them fail, but that probably is not as useful).
> > 
> 
> Would another option be as follows:
> 
> 1. When persistency is required, a new persist flag is specified to
>    the kernel by the client.
> 
> 2. On a connection failure, if the persist flag is set, don't
>    clear up and return with a specific error number. The fd is
>    still open (as still owned by the process), but (by assumption)
>    unusable.
> 
> 3. In persist mode, The block device only gets torn down when
>    the fd closes / userland process terminates (whichever is
>    easier, detection method TBD). Until then all writes block.
> 
> 4. A newer nbd client detects the errno in persist mode, opens another
>    fd, and calls the NBD_DOIT ioctl passing the old fd as an
>    additional parameter (or does a new ioctl first to associate
>    the new fd with the old fd). A new kernel then detects this,
>    closes the old fd, and 'takes over' the existing block device
>    with the new fd.
> 
> On an old client, the kernel behaviour is thus unchanged. Similarly
> if persist is not required. If a new client in persist mode crashes
> after step (2), then the block device will still be torn down when
> the process exits.
  Just to make it clear, my only comment was that tearing blockdev down
with kill_bdev() is the wrong way to do it (at least from filesystem POV).
NBD should rather put bdev into a state where it returns EIO for anything
you try to do with it after a network failure.

If you want some kind of persistency over network failures, you can queue
IO and attempt a reconnect - that really heavily reminds me the situation
dm-multipath solves for traditional fiberchannel multipathing so it might
be easiest stack dm-multipath over NBD and hack around multipath daemon to
understand specific needs of NBD and instead of switching to a different
fiberchannel path it would try to reconnect the network connection. Adding
Hannes to CC, maybe he will know why that would be a bad idea :).

								Honza
-- 
Jan Kara <jack@...1290...>
SUSE Labs, CR

Reply to:

References:
- Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
  - From: Jan Kara <jack@...1290...>
- Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
  - From: Wouter Verhelst <w@...112...>
- Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
  - From: Paul Clements <paul.clements@...856...>
- Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
  - From: Wouter Verhelst <wouter@...825...>
- Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
  - From: Alex Bligh <alex@...872...>

Prev by Date: Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
Next by Date: Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
Previous by thread: Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
Next by thread: Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
Index(es):
- Date
- Thread