Bug#1071501: Linux NFS client hangs in nfs4_lookup_revalidate
On Mon, 27 May 2024, Richard Kojedzinszky wrote:
> Dear Neil,
>
> I was running it on arm64, may that be the reason?
That could certainly explain it. I know x86_64 almost never needs
barriers, though I have seen cases where they matter. ppc64 is very
sensitive. A quick search suggests that arm64 does need barriers some
times.
I don't have arm64 hardware to test on but I'm happy with your
test results.
Thanks,
NeilBrown
>
> Regards,
> Richard
>
>
> On May 27, 2024 4:02:32 AM GMT+02:00, NeilBrown <neilb@suse.de> wrote:
>
> On Sun, 26 May 2024, Richard Kojedzinszky wrote:
> Dear Neil,
>
> According to my quick tests, your patch seems to fix this bug. Could you
> also manage to try my attached code, could you also reproduce the bug?
>
> Thanks for testing.
>
> I can run your test code but it isn't triggering the bug (90 minutes so
> far). Possibly a different compiler used for the kernel, possibly
> hardware differences (I'm running under qemu). Bugs related to barriers
> (which this one seems to be) need just the right circumstances to
> trigger so they can be hard to reproduce on a different system.
>
> I've made some cosmetic improvements to the patch and will post it to
> the NFS maintainers.
>
> Thanks again,
> NeilBrown
>
>
>
> Thanks,
> Richard
>
> 2024-05-24 07:29 időpontban Richard Kojedzinszky ezt írta:
> Dear Neil,
>
> I've applied your patch, and since then there are no lockups. Before
> that my application reported a lockup in a minute or two, now it has
> been running for half an hour, and still running.
>
> Thanks,
> Richard
>
> 2024-05-24 01:31 időpontban NeilBrown ezt írta:
> On Fri, 24 May 2024, Richard Kojedzinszky wrote:
> Dear devs,
>
> I am attaching a stripped down version of the little program which
> triggers the bug very quickly, in a few minutes in my test lab. It
> turned out that a single NFS mountpoint is enough. Just start the
> program giving it the NFS mount as first argument. It will chdir
> there,
> and do file operations, which will trigger a lockup in a few minutes.
>
> I couldn't get the go code to run. But then it is a long time since I
> played with go and I didn't try very hard.
> If you could provide simple instructions and a list of package
> dependencies that I need to install (on Debian), I can give it a try.
>
> Or you could try this patch. It might help, but I don't have high
> hopes. It adds some memory barriers and fixes a bug which would cause
> a
> problem if memory allocation failed (but memory allocation never
> fails).
>
> NeilBrown
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index ac505671efbd..5bcc0d14d519 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1804,7 +1804,7 @@ __nfs_lookup_revalidate(struct dentry *dentry,
> unsigned int flags,
> } else {
> /* Wait for unlink to complete */
> wait_var_event(&dentry->d_fsdata,
> - dentry->d_fsdata != NFS_FSDATA_BLOCKED);
> + smp_load_acquire(&dentry->d_fsdata) != NFS_FSDATA_BLOCKED);
> parent = dget_parent(dentry);
> ret = reval(d_inode(parent), dentry, flags);
> dput(parent);
> @@ -2508,7 +2508,7 @@ int nfs_unlink(struct inode *dir, struct dentry
> *dentry)
> spin_unlock(&dentry->d_lock);
> error = nfs_safe_remove(dentry);
> nfs_dentry_remove_handle_error(dir, dentry, error);
> - dentry->d_fsdata = NULL;
> + smp_store_release(&dentry->d_fsdata, NULL);
> wake_up_var(&dentry->d_fsdata);
> out:
> trace_nfs_unlink_exit(dir, dentry, error);
> @@ -2616,7 +2616,7 @@ nfs_unblock_rename(struct rpc_task *task, struct
> nfs_renamedata *data)
> {
> struct dentry *new_dentry = data->new_dentry;
>
> - new_dentry->d_fsdata = NULL;
> + smp_store_release(&new_dentry->d_fsdata, NULL);
> wake_up_var(&new_dentry->d_fsdata);
> }
>
> @@ -2717,6 +2717,10 @@ int nfs_rename(struct mnt_idmap *idmap, struct
> inode *old_dir,
> task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry,
> must_unblock ? nfs_unblock_rename : NULL);
> if (IS_ERR(task)) {
> + if (must_unlock) {
> + smp_store_release(&new_dentry->d_fsdata, NULL);
> + wake_up_var(&new_dentry->d_fsdata);
> + }
> error = PTR_ERR(task);
> goto out;
> }
>
>
>
>
>
>
Reply to: