Re: [PATCH v2] nbd: Don't use workqueue to handle recv work

To: Josef Bacik <josef@toxicpanda.com>
Cc: Christoph Hellwig <hch@infradead.org>, Jens Axboe <axboe@kernel.dk>, Bart Van Assche <bvanassche@acm.org>, linux-block@vger.kernel.org, nbd@other.debian.org, linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] nbd: Don't use workqueue to handle recv work
From: Yongji Xie <xieyongji@bytedance.com>
Date: Tue, 15 Feb 2022 21:17:37 +0800
Message-id: <[🔎] CACycT3thVwb466u2JR-oDRHLY5j_uxAx5uXXGmaoCZL5vs__mQ@mail.gmail.com>
In-reply-to: <CACycT3tTKBpS_B5vVJ8MZ1iuaF2bf-01=9+tAdxUddziF2DQ-g@mail.gmail.com>
References: <20211227091241.103-1-xieyongji@bytedance.com> <Ycycda8w/zHWGw9c@infradead.org> <CACycT3usfTdzmK=gOsBf3=-0e8HZ3_0ZiBJqkWb_r7nki7xzYA@mail.gmail.com> <YdMgCS1RMcb5V2RJ@localhost.localdomain> <CACycT3vYt0XNV2GdjKjDS1iyWieY_OV4h=W1qqk_AAAahRZowA@mail.gmail.com> <YdSMqKXv0PUkAwfl@localhost.localdomain> <CACycT3tPZOSkCXPz-oYCXRJ_EOBs3dC0+Juv=FYsa6qRS0GVCw@mail.gmail.com> <CACycT3tTKBpS_B5vVJ8MZ1iuaF2bf-01=9+tAdxUddziF2DQ-g@mail.gmail.com>

Ping again.

Hi Josef, could you take a look?

On Fri, Jan 21, 2022 at 4:34 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> Ping.
>
> On Wed, Jan 5, 2022 at 1:36 PM Yongji Xie <xieyongji@bytedance.com> wrote:
> >
> > On Wed, Jan 5, 2022 at 2:06 AM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > On Tue, Jan 04, 2022 at 01:31:47PM +0800, Yongji Xie wrote:
> > > > On Tue, Jan 4, 2022 at 12:10 AM Josef Bacik <josef@toxicpanda.com> wrote:
> > > > >
> > > > > On Thu, Dec 30, 2021 at 12:01:23PM +0800, Yongji Xie wrote:
> > > > > > On Thu, Dec 30, 2021 at 1:35 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > >
> > > > > > > On Mon, Dec 27, 2021 at 05:12:41PM +0800, Xie Yongji wrote:
> > > > > > > > The rescuer thread might take over the works queued on
> > > > > > > > the workqueue when the worker thread creation timed out.
> > > > > > > > If this happens, we have no chance to create multiple
> > > > > > > > recv threads which causes I/O hung on this nbd device.
> > > > > > >
> > > > > > > If a workqueue is used there aren't really 'receive threads'.
> > > > > > > What is the deadlock here?
> > > > > >
> > > > > > We might have multiple recv works, and those recv works won't quit
> > > > > > unless the socket is closed. If the rescuer thread takes over those
> > > > > > works, only the first recv work can run. The I/O needed to be handled
> > > > > > in other recv works would be hung since no thread can handle them.
> > > > > >
> > > > >
> > > > > I'm not following this explanation.  What is the rescuer thread you're talking
> > > >
> > > > https://www.kernel.org/doc/html/latest/core-api/workqueue.html#c.rescuer_thread
> > > >
> > >
> > > Ahhh ok now I see, thanks, I didn't know this is how this worked.
> > >
> > > So what happens is we do the queue_work(), this needs to do a GFP_KERNEL
> > > allocation internally, we are unable to satisfy this, and thus the work gets
> > > pushed onto the rescuer thread.
> > >
> > > Then the rescuer thread can't be used in the future because it's doing this long
> > > running thing.
> > >
> >
> > Yes.
> >
> > > I think the correct thing to do here is simply drop the WQ_MEM_RECLAIM bit.  It
> > > makes sense for workqueue's that are handling the work of short lived works that
> > > are in the memory reclaim path.  That's not what these workers are doing, yes
> > > they are in the reclaim path, but they run the entire time the device is up.
> > > The actual work happens as they process incoming requests.  AFAICT
> > > WQ_MEM_RECLAIM doesn't affect the actual allocations that the worker thread
> > > needs to do, which is what I think the intention was in using WQ_MEM_RECLAIM,
> > > which isn't really what it's used for.
> > >
> > > tl;dr, just remove thee WQ_MEM_RECLAIM flag completely and I think that's good
> > > enough?  Thanks,
> > >
> >
> > In the reconnect case, we still need to call queue_work() while the
> > device is running. So it looks like we can't simply remove the
> > WQ_MEM_RECLAIM flag.
> >
> > Thanks,
> > Yongji

Reply to:

Follow-Ups:
- Re: [PATCH v2] nbd: Don't use workqueue to handle recv work
  - From: Josef Bacik <josef@toxicpanda.com>

Prev by Date: From Karen Smith Trust
Next by Date: Re: [PATCH v2] nbd: Don't use workqueue to handle recv work
Previous by thread: From Karen Smith Trust
Next by thread: Re: [PATCH v2] nbd: Don't use workqueue to handle recv work
Index(es):
- Date
- Thread