Re: blktests failures with v6.17-rc1 kernel
On Sep 01, 2025 / 11:02, Daniel Wagner wrote:
> On Mon, Sep 01, 2025 at 10:34:23AM +0200, Daniel Wagner wrote:
> > The test is removing the ports while the host driver is about to
> > reconnect and accesses a stale pointer.
> >
> > nvme_fc_create_association is calling nvme_fc_ctlr_inactive_on_rport in
> > the error path. The problem is that nvme_fc_create_association gets half
> > through the setup and then fails. In the cleanup path
> >
> > dev_warn(ctrl->ctrl.device,
> > "NVME-FC{%d}: create_assoc failed, assoc_id %llx ret %d\n",
> > ctrl->cnum, ctrl->association_id, ret);
> >
> > is issued and then nvme_fc_ctlr_inactive_on_rport is called. And there
> > is the log message above, so it's clear the error path is taken.
> >
> > But the thing is fcloop is not supposed to remove the ports when the
> > host driver is still using it. So there is a race window where it's
> > possible to enter nvme_fc_create_assocation and fcloop removing the
> > ports.
> >
> > So between nvme_fc_create_assocation and nvme_fc_ctlr_active_on_rport.
>
> I think the problem is that nvme_fc_create_association is not holding
> the rport locks when checking the port_state and marking the rport
> active. This races with nvme_fc_unregister_remoteport.
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 3e12d4683ac7..03987f497a5b 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -3032,11 +3032,17 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>
> ++ctrl->ctrl.nr_reconnects;
>
> - if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE)
> + spin_lock_irqsave(&ctrl->rport->lock, flags);
> + if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) {
> + spin_unlock_irqrestore(&ctrl->rport->lock, flags);
> return -ENODEV;
> + }
>
> - if (nvme_fc_ctlr_active_on_rport(ctrl))
> + if (nvme_fc_ctlr_active_on_rport(ctrl)) {
> + spin_unlock_irqrestore(&ctrl->rport->lock, flags);
> return -ENOTUNIQ;
> + }
> + spin_unlock_irqrestore(&ctrl->rport->lock, flags);
>
> dev_info(ctrl->ctrl.device,
> "NVME-FC{%d}: create association : host wwpn 0x%016llx "
>
> I'll to reproduce it and see if this patch does make a difference.
I applied the fix patch above together with the previous fix patch on top of
v6.17-rc3, then I repeated nvme/061 with fc transport hundreds of times. I
did not observed the KASAN suaf. The fix patch looks working good. Thanks!
Reply to: