[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1071501: [PATCH] NFS: add barriers when testing for NFS_FSDATA_BLOCKED



On Wed, 29 May 2024, Trond Myklebust wrote:
> On Tue, 2024-05-28 at 11:19 +1000, NeilBrown wrote:
> > On Tue, 28 May 2024, Trond Myklebust wrote:
> > > On Mon, 2024-05-27 at 13:04 +1000, NeilBrown wrote:
> > > > 
> > > > dentry->d_fsdata is set to NFS_FSDATA_BLOCKED while unlinking or
> > > > renaming-over a file to ensure that no open succeeds while the
> > > > NFS
> > > > operation progressed on the server.
> > > > 
> > > > Setting dentry->d_fsdata to NFS_FSDATA_BLOCKED is done under -
> > > > >d_lock
> > > > after checking the refcount is not elevated.  Any attempt to open
> > > > the
> > > > file (through that name) will go through lookp_open() which will
> > > > take
> > > > ->d_lock while incrementing the refcount, we can be sure that
> > > > once
> > > > the
> > > > new value is set, __nfs_lookup_revalidate() *will* see the new
> > > > value
> > > > and
> > > > will block.
> > > > 
> > > > We don't have any locking guarantee that when we set ->d_fsdata
> > > > to
> > > > NULL,
> > > > the wait_var_event() in __nfs_lookup_revalidate() will notice.
> > > > wait/wake primitives do NOT provide barriers to guarantee order. 
> > > > We
> > > > must use smp_load_acquire() in wait_var_event() to ensure we look
> > > > at
> > > > an
> > > > up-to-date value, and must use smp_store_release() before
> > > > wake_up_var().
> > > > 
> > > > This patch adds those barrier functions and factors out
> > > > block_revalidate() and unblock_revalidate() far clarity.
> > > > 
> > > > There is also a hypothetical bug in that if memory allocation
> > > > fails
> > > > (which never happens in practice) we might leave ->d_fsdata
> > > > locked.
> > > > This patch adds the missing call to unblock_revalidate().
> > > > 
> > > > Reported-and-tested-by: Richard Kojedzinszky
> > > > <richard+debian+bugreport@kojedz.in>
> > > > Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071501
> > > > Fixes: 3c59366c207e ("NFS: don't unhash dentry during
> > > > unlink/rename")
> > > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > > ---
> > > >  fs/nfs/dir.c | 44 +++++++++++++++++++++++++++++---------------
> > > >  1 file changed, 29 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > index ac505671efbd..c91dc36d41cc 100644
> > > > --- a/fs/nfs/dir.c
> > > > +++ b/fs/nfs/dir.c
> > > > @@ -1802,9 +1802,10 @@ __nfs_lookup_revalidate(struct dentry
> > > > *dentry,
> > > > unsigned int flags,
> > > >  		if (parent != READ_ONCE(dentry->d_parent))
> > > >  			return -ECHILD;
> > > >  	} else {
> > > > -		/* Wait for unlink to complete */
> > > > +		/* Wait for unlink to complete - see
> > > > unblock_revalidate() */
> > > >  		wait_var_event(&dentry->d_fsdata,
> > > > -			       dentry->d_fsdata !=
> > > > NFS_FSDATA_BLOCKED);
> > > > +			       smp_load_acquire(&dentry-
> > > > >d_fsdata)
> > > > +			       != NFS_FSDATA_BLOCKED);
> > > 
> > > Doesn't this end up being a reversed ACQUIRE+RELEASE as described
> > > in
> > > the "LOCK ACQUISITION FUNCTIONS" section of Documentation/memory-
> > > barriers.txt?
> > 
> > I don't think so.  That section is talking about STORE operations
> > which
> > can move backwards through ACQUIRE and forwards through RELEASE.
> > 
> > Above we have a LOAD operation which mustn't move backwards through
> > ACQUIRE.  Below there is a STORE operation which mustn't move
> > forwards
> > through a RELEASE.  Both of those are guaranteed.
> 
> It isn't necessary for the LOAD to move backwards through the ACQUIRE.
> As I understand it, the point is that the ACQUIRE can move through the
> RELEASE as per the following paragraph in that document:
> 
>             Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
>             not imply a full memory barrier.  Therefore, the CPU's execution of the
>             critical sections corresponding to the RELEASE and the ACQUIRE can cross,
>             so that:
>             
>                     *A = a;
>                     RELEASE M
>                     ACQUIRE N
>                     *B = b;
>             
>             could occur as:
>             
>                     ACQUIRE N, STORE *B, STORE *A, RELEASE M

On the wakeup side we have:

  STORE ->d_fsdata
  RELEASE
  spin_lock
  LOAD: check is waitq is empty

and on the wait side we have

  STORE: add to waitq
  spin_unlock
  ACQUIRE
  LOAD ->d_fsdata

I believe that spin_lock is an ACQUIRE operation and spin_unlock is a
RELEASE operation.  So both of these have "ACQUIRE ; RELEASE" which
creates a full barrier.

> 
> This would presumably be why the function clear_and_wake_up_bit() needs
> a full memory barrier on most architectures, instead of being just an
> smp_wmb(). Is my understanding of this wrong?

clear_and_wake_up_bit() calls __wake_up_bit() which calls
waitqueue_active() without taking a lock.  So there is no ACQUIRE after
the RELEASE in clear_bit_unlock() and before testing for list-empty.  So
we need to explicitly add a barrier.

At least that is my understanding just now.  I hadn't worked through all
the details before, but I'm glad you prompted me to - thanks.

NeilBrown


Reply to: