[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH v2 3/3] nbd: fix race between nbd_alloc_config() and module removal



Hi Christoph,

On 9/9/2021 2:40 PM, Christoph Hellwig wrote:
> On Tue, Sep 07, 2021 at 11:04:16AM +0800, Hou Tao wrote:
>> Let me explain first. The reason it works is due to genl_lock_all() in netlink code.
> Btw, please properly format your mail.  These overly long lines are really
> hard to read.
Thanks for reminding.
>> If the module removal happens before calling try_module_get(), nbd_cleanup() will
>>
>> call genl_unregister_family() first, and then genl_lock_all(). genl_lock_all() will
>>
>> prevent ops in nbd_connect_genl_ops() from being called, because the calling
>>
>> of nbd ops happens in genl_rcv() which needs to acquire cb_lock first.
> Good.
>
>> I have checked multiple genl_ops, it seems that the reason why these genl_ops
>>
>> don't need try_module_get() is that these ops don't create new object through
>>
>> genl_ops and just control it. However genl_family_rcv_msg_dumpit() will try to
>>
>> call try_module_get(), but according to the history (6dc878a8ca39 "netlink: add reference of module in netlink_dump_start"),
>>
>> it is because inet_diag_handler_cmd() will call __netlink_dump_start().
> And now taking a step back:  Why do we even need this extra module
> reference?  For the case where nbd_alloc_config is called from nbd_open
> we obviously don't need it.  In the case where it is called from
> nbd_genl_connect that prevents unloading nbd when there is a configured
> but not actually nbd device.  Which isn't reallyed need and counter to
> how other drivers work.
Yes, the purpose of module ref-counting in nbd_alloc_config() is to force
the user to disconnect the nbd device manually before module removal.
And loop device works in the same way. If a file is attached to a loop device,
an extra module reference will be taken in loop_configure() and the removal
of loop module will fail. The only difference is that loop driver takes the
extra ref-count by ioctl, and nbd does it through netlink.
>
> Did you try just removing the extra module refcounting?
Yes, removing the module refcounting in nbd_alloc_config() and cleaning
the nbd_config in nbd_cleanup() also work, but not sure whether or not
it will break some nbd user-case which depends on the extra module
reference count. I prefer to keep the extra module refcounting considering
that loop driver does it as well, so what is your suggestion ?

Regards,
Tao

> .


Reply to: