[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting



Op 15-11-13 22:11, Paul Clements schreef:
> On Fri, Nov 15, 2013 at 2:50 AM, Wouter Verhelst <w@...112...
> <mailto:w@...112...>> wrote:
> 
> 
>     I'm not sure if this has been implemented that way (that's Paul's area,
>     not mine), but the intention was that the nbd kernel module would only
>     do cleanup once the nbd-client process exits. 
> 
> 
> Not quite. It cleans up at the end of NBD_DO_IT ioctl, before returning
> to userland to do the reconnect.

Oh. That's a misunderstanding on my part, then.

>     That is, if nbd-client has
>     not yet exited, that could be because it's in -persist mode and is
>     trying to reconnect.
> 
> 
> The -persist mode will only work if there is no ongoing I/O. With I/O
> you're likely to get a kernel panic in the filesystem.

Right.

> In order for nbd to seamlessly handle this situation, we'd have to do a
> reconnect in-kernel

This would be fairly complicated, since all the connection and
negotiation currently happens in userspace. I'm not sure I want to go
down that route.

> (or have a callout to userland to reconnect)

That sounds interesting, too. How would you do that?

> and
> then we'd have to retry any I/Os that may have failed in the meantime
> (or just let them fail, but that probably is not as useful).
> 
> 
> The solution that Jack mentions is worth looking into -- it should at
> least avoid the filesystem panics that we now have. I'll take a look...

I'm not sure. This would mean that auto-reconnect couldn't work anymore
-- unless you also do the callout to userland thing you mentioned above.

-- 
This end should point toward the ground if you want to go to space.

If it starts pointing toward space you are having a bad problem and you
will not go to space today.

  -- http://xkcd.com/1133/



Reply to: