Re: [PATCH v2 3/3] nbd: fix race between nbd_alloc_config() and module removal
On Tue, Sep 07, 2021 at 11:04:16AM +0800, Hou Tao wrote:
> Let me explain first. The reason it works is due to genl_lock_all() in netlink code.
Btw, please properly format your mail. These overly long lines are really
hard to read.
> If the module removal happens before calling try_module_get(), nbd_cleanup() will
>
> call genl_unregister_family() first, and then genl_lock_all(). genl_lock_all() will
>
> prevent ops in nbd_connect_genl_ops() from being called, because the calling
>
> of nbd ops happens in genl_rcv() which needs to acquire cb_lock first.
Good.
> I have checked multiple genl_ops, it seems that the reason why these genl_ops
>
> don't need try_module_get() is that these ops don't create new object through
>
> genl_ops and just control it. However genl_family_rcv_msg_dumpit() will try to
>
> call try_module_get(), but according to the history (6dc878a8ca39 "netlink: add reference of module in netlink_dump_start"),
>
> it is because inet_diag_handler_cmd() will call __netlink_dump_start().
And now taking a step back: Why do we even need this extra module
reference? For the case where nbd_alloc_config is called from nbd_open
we obviously don't need it. In the case where it is called from
nbd_genl_connect that prevents unloading nbd when there is a configured
but not actually nbd device. Which isn't reallyed need and counter to
how other drivers work.
Did you try just removing the extra module refcounting?
Reply to: