Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
- To: Paul Clements <paul.clements@...856...>
- Cc: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, Wouter Verhelst <w@...112...>, Jack Kara <jack@...1290...>
- Subject: Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting
- From: Wouter Verhelst <wouter@...825...>
- Date: Sun, 17 Nov 2013 10:46:44 +0100
- Message-id: <52889084.2080700@...825...>
- In-reply-to: <CAECXXi6Vt5gAjv=qkrGzLG3iRjNmjYiYZd7+gCXK860a2tonKg@...18...>
- References: <8bf7c5db475eefcf17976a36f892200d@...1427...> <20131112214632.GB31763@...1426...> <7c1b2ca40c3abfe805e9e944f21c7016@...1427...> <20131114075827.GA13554@...1426...> <5285D258.9040808@...112...> <CAECXXi6Vt5gAjv=qkrGzLG3iRjNmjYiYZd7+gCXK860a2tonKg@...18...>
Op 15-11-13 22:11, Paul Clements schreef:
> On Fri, Nov 15, 2013 at 2:50 AM, Wouter Verhelst <w@...112...
> <mailto:w@...112...>> wrote:
>
>
> I'm not sure if this has been implemented that way (that's Paul's area,
> not mine), but the intention was that the nbd kernel module would only
> do cleanup once the nbd-client process exits.
>
>
> Not quite. It cleans up at the end of NBD_DO_IT ioctl, before returning
> to userland to do the reconnect.
Oh. That's a misunderstanding on my part, then.
> That is, if nbd-client has
> not yet exited, that could be because it's in -persist mode and is
> trying to reconnect.
>
>
> The -persist mode will only work if there is no ongoing I/O. With I/O
> you're likely to get a kernel panic in the filesystem.
Right.
> In order for nbd to seamlessly handle this situation, we'd have to do a
> reconnect in-kernel
This would be fairly complicated, since all the connection and
negotiation currently happens in userspace. I'm not sure I want to go
down that route.
> (or have a callout to userland to reconnect)
That sounds interesting, too. How would you do that?
> and
> then we'd have to retry any I/Os that may have failed in the meantime
> (or just let them fail, but that probably is not as useful).
>
>
> The solution that Jack mentions is worth looking into -- it should at
> least avoid the filesystem panics that we now have. I'll take a look...
I'm not sure. This would mean that auto-reconnect couldn't work anymore
-- unless you also do the callout to userland thing you mentioned above.
--
This end should point toward the ground if you want to go to space.
If it starts pointing toward space you are having a bad problem and you
will not go to space today.
-- http://xkcd.com/1133/
Reply to: