Re: [PATCH] nbd: fix false lockdep deadlock warning
- To: Ming Lei <ming.lei@redhat.com>, Yu Kuai <yukuai1@huaweicloud.com>
- Cc: josef@toxicpanda.com, axboe@kernel.dk, hch@infradead.org, hare@suse.de, linux-block@vger.kernel.org, nbd@other.debian.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, "yukuai (C)" <yukuai3@huawei.com>
- Subject: Re: [PATCH] nbd: fix false lockdep deadlock warning
- From: Nilay Shroff <nilay@linux.ibm.com>
- Date: Wed, 2 Jul 2025 11:52:08 +0530
- Message-id: <[🔎] 7b09167f-bf8d-4d94-9317-3cfbb4f83cd8@linux.ibm.com>
- In-reply-to: <aGSaVhiH2DeTvtdr@fedora>
- References: <20250627092348.1527323-1-yukuai1@huaweicloud.com> <aF56oVEzTygIOUTN@fedora> <c2fbaacc-62a1-4a98-4157-2637b7f242b7@huaweicloud.com> <[🔎] 197b6dca-56be-438d-a60f-21011367c5ed@linux.ibm.com> <[🔎] 99b4afce-de05-ddcb-2634-b19214cf4534@huaweicloud.com> <aGSaVhiH2DeTvtdr@fedora>
On 7/2/25 8:02 AM, Ming Lei wrote:
> On Wed, Jul 02, 2025 at 09:12:09AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/07/01 21:28, Nilay Shroff 写道:
>>>
>>>
>>> On 6/28/25 6:18 AM, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2025/06/27 19:04, Ming Lei 写道:
>>>>> I guess the patch in the following link may be simper, both two take
>>>>> similar approach:
>>>>>
>>>>> https://lore.kernel.org/linux-block/aFjbavzLAFO0Q7n1@fedora/
>>>>
>>>> I this the above approach has concurrent problems if nbd_start_device
>>>> concurrent with nbd_start_device:
>>>>
>>>> t1:
>>>> nbd_start_device
>>>> lock
>>>> num_connections = 1
>>>> unlock
>>>> t2:
>>>> nbd_add_socket
>>>> lock
>>>> config->num_connections++
>>>> unlock
>>>> t3:
>>>> nbd_start_device
>>>> lock
>>>> num_connections = 2
>>>> unlock
>>>> blk_mq_update_nr_hw_queues
>>>>
>>>> blk_mq_update_nr_hw_queues
>>>> //nr_hw_queues updated to 1 before failure
>>>> return -EINVAL
>>>>
>>>
>>> In the above case, yes I see that t1 would return -EINVAL (as
>>> config->num_connections doesn't match with num_connections)
>>> but then t3 would succeed to update nr_hw_queue (as both
>>> config->num_connections and num_connections set to 2 this
>>> time). Isn't it? If yes, then the above patch (from Ming)
>>> seems good.
>>
>> Emm, I'm confused, If you agree with the concurrent process, then
>> t3 update nr_hw_queues to 2 first and return sucess, later t1 update
>> nr_hw_queues back to 1 and return failure.
>
> It should be easy to avoid failure by simple retrying.
>
Yeah I think retry should be a safe bet here.
On another note, synchronizing nbd_start_device and nbd_add_socket
using nbd->task_setup looks more complex and rather we may use
nbd->pid to synchronize both. We need to move setting of nbd->pid
before we invoke blk_mq_update_nr_hw_queues in nbd_start_device.
Then in nbd_add_socket we can evaluate nbd->pid and if it's
non-NULL then we could assume that either nr_hw_queues update is in
progress or device has been setup and so return -EBUSY. I think
anyways updating number of connections once device is configured
would not be possible, so once nbd_start_device is initiated, we
shall prevent user adding more connections. If we follow this
approach then IMO we don't need to add retry discussed above.
Thanks,
--Nilay
Reply to: