Re: [Nbd] nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive

To: Mike Snitzer <snitzer@...17...>
Cc: nbd-general@lists.sourceforge.net, linux-kernel@...25...
Subject: Re: [Nbd] nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive
From: Paul Clements <paul.clements@...124...>
Date: Thu, 27 Mar 2008 08:35:55 -0400
Message-id: <47EB94AB.6090608@...124...>
In-reply-to: <170fa0d20803261143s1ab258b2ra470c158ac5744a@...18...>
References: <170fa0d20803261143s1ab258b2ra470c158ac5744a@...18...>

Mike Snitzer wrote:

In practice this looks like:

nbd1: NBD_DISCONNECT
nbd1: Send control failed (result -32)
end_request: I/O error, dev nbd1, sector 0
end_request: I/O error, dev nbd1, sector 8032264
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on nbd1, disabling device.
        Operation continuing on 1 devices
Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
 [<ffffffff88b1e125>] :nbd:sock_xmit+0x9d/0x301

The fact that sock_xmit() in receive mode is unprotected seems to be
the WHY a NULL pointer is possible; but I'm still trying to identify
the HOW.

Do you know who is setting the socket NULL? Is it already NULL when youget to this point? Is it the nbd-client -d? Is it the originalnbd-client/kernel that does it? Figuring that out would help narrow downthe cause.

But for me this begs the question:  why isn't the nbd_device's socket
always protected during sock_xmit() for both
transmits and receives; rather than just transmits (via tx_lock)!?

It would deadlock if we held the lock over both. Generally we don't haveto worry about receives, since they're always done in the nbd-clientprocess, so we have control over when and how it exits and cleans up.The odd case, as you've discovered, is when another process (nbd-client-d) comes along and starts mucking with the queue and socket. Would"kill -9 <nbd-client-pid>" work for you instead? That is what I use tobreak the connection, and it's safe, as it tells the original nbd-clientto exit (which it does cleanly and safely).


--
Paul

Reply to:

Follow-Ups:
- Re: [Nbd] nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive
  - From: "Mike Snitzer" <snitzer@...17...>

Prev by Date: [Nbd] Fwd: nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive
Next by Date: Re: [Nbd] nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive
Previous by thread: [Nbd] Fwd: nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive
Next by thread: Re: [Nbd] nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive
Index(es):
- Date
- Thread