Re: blktests failures with v6.17-rc1 kernel
On Thu, Aug 28, 2025 at 05:55:06AM +0000, Shinichiro Kawasaki wrote:
> On Aug 27, 2025 / 12:10, Daniel Wagner wrote:
> > On Wed, Aug 13, 2025 at 10:50:34AM +0000, Shinichiro Kawasaki wrote:
> > > #4: nvme/061 (fc transport)
> > >
> > > The test case nvme/061 sometimes fails for fc transport due to a WARN and
> > > refcount message "refcount_t: underflow; use-after-free." Refer to the
> > > report for the v6.15 kernel [5].
> > >
> > > [5]
> > > https://lore.kernel.org/linux-block/2xsfqvnntjx5iiir7wghhebmnugmpfluv6ef22mghojgk6gilr@mvjscqxroqqk/
> >
> > This one might be fixed with
> >
> > https://lore.kernel.org/linux-nvme/20250821-fix-nvmet-fc-v1-1-3349da4f416e@kernel.org/
>
> I applied this patch on top of v6.17-rc3 kernel, but still I observe the
> refcount WARN at nvme/061 with.
Thanks for testing and I was able to reproduce it also. The problem is
that it's possible that an association is scheduled for deletion twice.
Would you mind to give the attached patch a try? It fixes the problem I
was able to reproduce.
> Said that, I like the patch. This week, I noticed that nvme/030 hangs with fc
> transport. This hang is rare, but it is recreated in stable manner when I
> repeat the test case. I tried the fix patch, and it avoided this hang :)
> Thanks for the fix!
Ah, nice so at least this one is fixed by the first patch :)
>From b0db044f5e828d5c12c368fecd17327f7a6e854d Mon Sep 17 00:00:00 2001
From: Daniel Wagner <wagi@kernel.org>
Date: Thu, 28 Aug 2025 13:18:21 +0200
Subject: [PATCH] nvmet-fc: avoid scheduling association deletion twice
When forcefully shutting down a port via the configfs interface,
nvmet_port_subsys_drop_link() first calls nvmet_port_del_ctrls() and
then nvmet_disable_port(). Both functions will eventually schedule all
remaining associations for deletion.
The current implementation checks whether an association is about to be
removed, but only after the work item has already been scheduled. As a
result, it is possible for the first scheduled work item to free all
resources, and then for the same work item to be scheduled again for
deletion.
Because the association list is an RCU list, it is not possible to take
a lock and remove the list entry directly, so it cannot be looked up
again. Instead, a flag (terminating) must be used to determine whether
the association is already in the process of being deleted.
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
drivers/nvme/target/fc.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 6725c34dd7c9..7d84527d5a43 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1075,6 +1075,14 @@ nvmet_fc_delete_assoc_work(struct work_struct *work)
static void
nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
{
+ int terminating;
+
+ terminating = atomic_xchg(&assoc->terminating, 1);
+
+ /* if already terminating, do nothing */
+ if (terminating)
+ return;
+
nvmet_fc_tgtport_get(assoc->tgtport);
if (!queue_work(nvmet_wq, &assoc->del_work))
nvmet_fc_tgtport_put(assoc->tgtport);
@@ -1202,13 +1210,7 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
{
struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
unsigned long flags;
- int i, terminating;
-
- terminating = atomic_xchg(&assoc->terminating, 1);
-
- /* if already terminating, do nothing */
- if (terminating)
- return;
+ int i;
spin_lock_irqsave(&tgtport->lock, flags);
list_del_rcu(&assoc->a_list);
--
2.51.0
Reply to: