[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1017720: nfs-common: No such file or directory



I was able to identify another workaround today which may help you to identify the issue.
The workaround is to touch the directory where the troubled files live on the file server.
I believe this tells us that updating the modify time attribute is used by the cache.
It should be noted that access time updates are disabled on the file server.

I also wanted to restate that we use rsync to push out these application updates and also use rsync to sync data files.
Our rsync options preserve timestamps, so it is possible that the new files have an older timestamp than "now".
It is not the case that the new files have an older timestamp than the prior version that is stuck in the cache.

The rsync process that I describe has not changed and has been in use for many years.

> -----Original Message-----
> From: Jason Breitman
> Sent: Thursday, August 25, 2022 11:54 AM
> To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I have the same issue after adding actimeo=30 to /etc/fstab, rebooting and
> testing.
> I also confirmed that those settings applied via /proc/mounts which shows
> the below snippet for each mountpoint.
> nfs4
> rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a
> cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s
> ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0
> 0
> 
> > -----Original Message-----
> > From: Jason Breitman
> > Sent: Tuesday, August 23, 2022 2:42 PM
> > To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > What additional information can I provide for us to move forward with this
> > process?
> >
> > To summarize and include further details, rsync is used to sync applications
> to
> > a file server which behaves like a repository.
> > We do preserve timestamps from the build server and also use --delete.
> We
> > do not run the applications from the file server.  All servers use NTP.
> >
> > The application has a sub-directory that contain files with version numbers.
> > These are libraries.
> > When a new build is complete, a developer pushes their updates via rsync
> to
> > the file server / repository.
> >
> > I believe that the dentry cache thinks the "old" files exist and generates a
> No
> > such file or directory error showing question marks for that files attributes.
> > Dropping the dentry cache via echo 2 > /proc/sys/vm/drop_caches
> resolves
> > the issue.
> >
> > This behavior is not observed in Debian 10.8 with that distributions
> associated
> > kernel and packages.
> >
> > > -----Original Message-----
> > > From: Jason Breitman
> > > Sent: Friday, August 19, 2022 9:52 PM
> > > To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > > -----Original Message-----
> > > > From: Ben Hutchings <ben@decadent.org.uk>
> > > > Sent: Friday, August 19, 2022 7:27 PM
> > > > To: Jason Breitman <jbreitman@tildenparkcapital.com>;
> > > > 1017720@bugs.debian.org
> > > > Subject: Re: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > Control: tag -1 moreinfo
> > > >
> > > > On Fri, 2022-08-19 at 13:16 +0000, Jason Breitman wrote:
> > > > > Package: nfs-common
> > > > > Version: 1:1.3.4-6
> > > > > Severity: important
> > > > >
> > > > > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30)
> x86_64
> > > > > GNU/Linux
> > > > >
> > > > > -- Description
> > > > >     After updating and or creating new files on our file server via
> > > > > rsync, we see many files report the error message below from NFSv4
> > > > > clients since upgrading from Debian 10.8 to Debian 11.4.
> > > > >     Clearing the dentry cache resolves the issue right away.
> > > > >     I am not sure that nfs-common is the package to blame, but listed
> > > > > it based on the bug submission recommendations.
> > > >
> > > > The NFS implementation is mostly in the kernel, so probably this issue
> > > > belongs there.  But the kernel team is responsible for both packages.
> > > >
> > > > [...]
> > > > > -- Error message
> > > > >     ls: cannot access 'filename': No such file or directory
> > > > >     -????????? ? ?    ?            ?            ? filename
> > > > [...]
> > > >
> > > > So we know the file's there but can't stat it.  I think this means the
> > > > client has cached the handle of the old file of that name, which has
> > > > been deleted.
> > > >
> > > > - Are client and server clocks closely synchronised?  If not, that
> > > > needs to be fixed.
> > > >
> > > The clocks are synchronized using NTP.
> > >
> > > > - Are clients likely to read this directory while rsync is running, or
> > > > shortly before?  If so, it may help to reduce the attribute caching
> > > > timeout on the client.  See the "Directory entry caching" section in
> > > > the nfs(5) manual page.
> > > >
> > > Clients are not likely to read this directory while rsync is running for the
> > > observed cases.  That can happen in our environment, but not in this
> case.
> > > I am using the lookupcache=pos option.  I tried noac, but the
> performance
> > > penalty was too much.  Which option are you referring to and what
> setting
> > > do you recommend testing?
> > >
> > > > I don't know why you're only seeing this after an upgrade of the
> > > > clients, though.  I'm not aware that there has been any big change to
> > > > attribute caching.
> > > >
> > > I appreciate you responding to my report and am happy to answer any
> > > questions.
> > > We have multiple monitors and log scrapers to detect "file not found"
> > > exceptions that would let us know if this was happening before.
> > > To share more, I have 2 environments mounting from the same file
> server.
> > > Each environment has several servers.  The issue is only seen in the
> > > environment running Debian 11.4.
> > > I also should have mentioned that the files in question have a version
> > > number appended.  filename-1111.  When the file is updated via rsync, it
> is
> > > called filename-1112 and the prior file is removed.  The error is about
> > > filename-1111.
> > > I am not sure if this is the proper terminology, but the issue appears to be
> > > the negative dentry cache.
> > >
> > > > Ben.
> > > >
> > > > --
> > > > Ben Hutchings
> > > > Beware of bugs in the above code;
> > > > I have only proved it correct, not tried it. - Donald Knuth
> > >
> > > Jason Breitman
> > Jason Breitman
> Jason Breitman
Jason Breitman

Reply to: