[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH v2 3/3] nbd: fix race between nbd_alloc_config() and module removal



On Sat, Sep 04, 2021 at 08:25:19PM +0800, Hou Tao wrote:
> When nbd module is being removing, nbd_alloc_config() may be
> called concurrently by nbd_genl_connect(), although try_module_get()
> will return false, but nbd_alloc_config() doesn't handle it.
> 
> The race may lead to the leak of nbd_config and its related
> resources (e.g, recv_workq) and oops in nbd_read_stat() due
> to the unload of nbd module as shown below:
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000040
>   Oops: 0000 [#1] SMP PTI
>   CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>   Workqueue: knbd16-recv recv_work [nbd]
>   RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
>   Call Trace:
>    recv_work+0x3b/0xb0 [nbd]
>    process_one_work+0x1ed/0x390
>    worker_thread+0x4a/0x3d0
>    kthread+0x12a/0x150
>    ret_from_fork+0x22/0x30
> 
> Fixing it by checking the return value of try_module_get()
> in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
> assign nbd->config only when nbd_alloc_config() succeeds to ensure
> the value of nbd->config is binary (valid or NULL).
> 
> Also adding a debug message to check the reference counter
> of nbd_config during module removal.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  drivers/block/nbd.c | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index cedd3648e1a7..fa6c069b79dc 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -1473,15 +1473,20 @@ static struct nbd_config *nbd_alloc_config(void)
>  {
>  	struct nbd_config *config;
>  
> +	if (!try_module_get(THIS_MODULE))
> +		return ERR_PTR(-ENODEV);

try_module_get(THIS_MODULE) is an indicator for an unsafe pattern.  If
we don't already have a reference it could never close the race.

Looking at the callers:

 - nbd_open like all block device operations must have a reference
   already.
 - for nbd_genl_connect I'm not an expert, but given that struct
   nbd_genl_family has a module member I suspect the networkinh
   code already takes a reference.

So this should be able to use __module_get.


Reply to: