[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1017720: nfs-common: No such file or directory



I now know that this behavior does exist in Debian Buster 10.8 and more specifically in the 4.19.X kernel after running stricter testing on more servers.
The 4.19.X kernel resolves itself immediately following the No such file or directory error which is different than the 5.X kernel requiring me to clear the inode and dentry cache by running echo 2 > /proc/sys/vm/drop_caches.
What further information is required to resolve this issue?

> -----Original Message-----
> From: Jason Breitman
> Sent: Tuesday, September 13, 2022 4:41 PM
> To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I downgraded the nfs-common package which required the downgrade of
> the libevent packages and am using the 4.19.X kernel.
> I see the issue running the initial test, but then the issue is gone when
> running the test a subsequent time.
> 
> libevent-2.1-6:amd64                      2.1.8-stable-4                        amd64
> Asynchronous event notification library
> libevent-core-2.1-6:amd64             2.1.8-stable-4                        amd64
> Asynchronous event notification library (core)
> libevent-pthreads-2.1-6:amd64     2.1.8-stable-4                        amd64
> Asynchronous event notification library (pthreads)
> linux-image-4.19.0-21-amd64        4.19.249-2                              amd64        Linux
> 4.19 for 64-bit PCs (signed)
> nfs-common                                      1:1.3.4-2.5+deb10u1            amd64        NFS
> support files common to client and server
> 
> What other packages do I need to downgrade in order to get Debian 11.4 to
> behave like Debian 10.8?
> What additional questions can I answer so that we can move forward?
> 
> > -----Original Message-----
> > From: Jason Breitman
> > Sent: Tuesday, September 6, 2022 5:18 PM
> > To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I also see the failure with the kernels below, but the 4.19.X kernel resolves
> > the issue without dropping caches.
> > linux-image-4.19.0-14-amd64       4.19.171-2                     amd64        Linux 4.19
> for
> > 64-bit PCs (signed)
> > linux-image-4.19.0-21-amd64       4.19.249-2                     amd64        Linux 4.19
> for
> > 64-bit PCs (signed)
> >
> > I see the issue running the initial test, but then the issue is gone when
> > running the test a subsequent time.
> > I ran several tests to verify the behavior differences between the 4.19.X
> and
> > 5.X kernels.
> >
> > -- Test
> >     ls -l /mnt/dir/someOtherDir/* | grep '?'
> >
> > -- Error message - the error message is showing files that have been erased
> > via rsync --delete
> >     ls: cannot access 'filename': No such file or directory
> >     -????????? ? ?    ?            ?            ? filename
> >
> > > -----Original Message-----
> > > From: Jason Breitman
> > > Sent: Friday, September 2, 2022 5:17 PM
> > > To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > I have tested with the following kernels and see this issue in each case.
> > >
> > > linux-image-5.10.0-16-amd64              5.10.127-1                              amd64
> > Linux
> > > 5.10 for 64-bit PCs (signed)
> > > linux-image-5.15.0-0.bpo.3-amd64     5.15.15-2~bpo11+1              amd64
> > > Linux 5.15 for 64-bit PCs (signed)
> > > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1              amd64
> > > Linux 5.18 for 64-bit PCs (signed)
> > >
> > > An interesting note is that when using the 5.18 kernel, I had to run echo 3
> >
> > > /proc/sys/vm/drop_caches to resolve the issue.
> > > echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10 and
> > > 5.15 kernels.
> > >
> > > > -----Original Message-----
> > > > From: Jason Breitman
> > > > Sent: Friday, August 26, 2022 3:36 PM
> > > > To: 'Ben Hutchings' <ben@decadent.org.uk>;
> '1017720@bugs.debian.org'
> > > > <1017720@bugs.debian.org>
> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > I was able to identify another workaround today which may help you to
> > > > identify the issue.
> > > > The workaround is to touch the directory where the troubled files live
> on
> > > the
> > > > file server.
> > > > I believe this tells us that updating the modify time attribute is used by
> > the
> > > > cache.
> > > > It should be noted that access time updates are disabled on the file
> > server.
> > > >
> > > > I also wanted to restate that we use rsync to push out these application
> > > > updates and also use rsync to sync data files.
> > > > Our rsync options preserve timestamps, so it is possible that the new
> files
> > > > have an older timestamp than "now".
> > > > It is not the case that the new files have an older timestamp than the
> > prior
> > > > version that is stuck in the cache.
> > > >
> > > > The rsync process that I describe has not changed and has been in use
> for
> > > > many years.
> > > >
> > > > > -----Original Message-----
> > > > > From: Jason Breitman
> > > > > Sent: Thursday, August 25, 2022 11:54 AM
> > > > > To: Ben Hutchings <ben@decadent.org.uk>;
> 1017720@bugs.debian.org
> > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > > >
> > > > > I have the same issue after adding actimeo=30 to /etc/fstab,
> rebooting
> > > and
> > > > > testing.
> > > > > I also confirmed that those settings applied via /proc/mounts which
> > > shows
> > > > > the below snippet for each mountpoint.
> > > > > nfs4
> > > > >
> > > >
> > >
> >
> rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a
> > > > >
> > > >
> > >
> >
> cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s
> > > > >
> > > >
> > >
> >
> ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0
> > > > > 0
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jason Breitman
> > > > > > Sent: Tuesday, August 23, 2022 2:42 PM
> > > > > > To: Ben Hutchings <ben@decadent.org.uk>;
> > 1017720@bugs.debian.org
> > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > > > >
> > > > > > What additional information can I provide for us to move forward
> with
> > > > this
> > > > > > process?
> > > > > >
> > > > > > To summarize and include further details, rsync is used to sync
> > > > applications
> > > > > to
> > > > > > a file server which behaves like a repository.
> > > > > > We do preserve timestamps from the build server and also use --
> > > delete.
> > > > > We
> > > > > > do not run the applications from the file server.  All servers use NTP.
> > > > > >
> > > > > > The application has a sub-directory that contain files with version
> > > > numbers.
> > > > > > These are libraries.
> > > > > > When a new build is complete, a developer pushes their updates via
> > > > rsync
> > > > > to
> > > > > > the file server / repository.
> > > > > >
> > > > > > I believe that the dentry cache thinks the "old" files exist and
> > generates
> > > a
> > > > > No
> > > > > > such file or directory error showing question marks for that files
> > > > attributes.
> > > > > > Dropping the dentry cache via echo 2 > /proc/sys/vm/drop_caches
> > > > > resolves
> > > > > > the issue.
> > > > > >
> > > > > > This behavior is not observed in Debian 10.8 with that distributions
> > > > > associated
> > > > > > kernel and packages.
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jason Breitman
> > > > > > > Sent: Friday, August 19, 2022 9:52 PM
> > > > > > > To: Ben Hutchings <ben@decadent.org.uk>;
> > > 1017720@bugs.debian.org
> > > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ben Hutchings <ben@decadent.org.uk>
> > > > > > > > Sent: Friday, August 19, 2022 7:27 PM
> > > > > > > > To: Jason Breitman <jbreitman@tildenparkcapital.com>;
> > > > > > > > 1017720@bugs.debian.org
> > > > > > > > Subject: Re: Bug#1017720: nfs-common: No such file or directory
> > > > > > > >
> > > > > > > > Control: tag -1 moreinfo
> > > > > > > >
> > > > > > > > On Fri, 2022-08-19 at 13:16 +0000, Jason Breitman wrote:
> > > > > > > > > Package: nfs-common
> > > > > > > > > Version: 1:1.3.4-6
> > > > > > > > > Severity: important
> > > > > > > > >
> > > > > > > > > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-
> 30)
> > > > > x86_64
> > > > > > > > > GNU/Linux
> > > > > > > > >
> > > > > > > > > -- Description
> > > > > > > > >     After updating and or creating new files on our file server via
> > > > > > > > > rsync, we see many files report the error message below from
> > > > NFSv4
> > > > > > > > > clients since upgrading from Debian 10.8 to Debian 11.4.
> > > > > > > > >     Clearing the dentry cache resolves the issue right away.
> > > > > > > > >     I am not sure that nfs-common is the package to blame, but
> > > listed
> > > > > > > > > it based on the bug submission recommendations.
> > > > > > > >
> > > > > > > > The NFS implementation is mostly in the kernel, so probably this
> > > issue
> > > > > > > > belongs there.  But the kernel team is responsible for both
> > > packages.
> > > > > > > >
> > > > > > > > [...]
> > > > > > > > > -- Error message
> > > > > > > > >     ls: cannot access 'filename': No such file or directory
> > > > > > > > >     -????????? ? ?    ?            ?            ? filename
> > > > > > > > [...]
> > > > > > > >
> > > > > > > > So we know the file's there but can't stat it.  I think this means
> the
> > > > > > > > client has cached the handle of the old file of that name, which
> > has
> > > > > > > > been deleted.
> > > > > > > >
> > > > > > > > - Are client and server clocks closely synchronised?  If not, that
> > > > > > > > needs to be fixed.
> > > > > > > >
> > > > > > > The clocks are synchronized using NTP.
> > > > > > >
> > > > > > > > - Are clients likely to read this directory while rsync is running, or
> > > > > > > > shortly before?  If so, it may help to reduce the attribute caching
> > > > > > > > timeout on the client.  See the "Directory entry caching" section
> in
> > > > > > > > the nfs(5) manual page.
> > > > > > > >
> > > > > > > Clients are not likely to read this directory while rsync is running
> for
> > > the
> > > > > > > observed cases.  That can happen in our environment, but not in
> > this
> > > > > case.
> > > > > > > I am using the lookupcache=pos option.  I tried noac, but the
> > > > > performance
> > > > > > > penalty was too much.  Which option are you referring to and
> what
> > > > > setting
> > > > > > > do you recommend testing?
> > > > > > >
> > > > > > > > I don't know why you're only seeing this after an upgrade of the
> > > > > > > > clients, though.  I'm not aware that there has been any big
> change
> > > to
> > > > > > > > attribute caching.
> > > > > > > >
> > > > > > > I appreciate you responding to my report and am happy to answer
> > > any
> > > > > > > questions.
> > > > > > > We have multiple monitors and log scrapers to detect "file not
> > found"
> > > > > > > exceptions that would let us know if this was happening before.
> > > > > > > To share more, I have 2 environments mounting from the same
> file
> > > > > server.
> > > > > > > Each environment has several servers.  The issue is only seen in
> the
> > > > > > > environment running Debian 11.4.
> > > > > > > I also should have mentioned that the files in question have a
> > version
> > > > > > > number appended.  filename-1111.  When the file is updated via
> > > rsync,
> > > > it
> > > > > is
> > > > > > > called filename-1112 and the prior file is removed.  The error is
> > about
> > > > > > > filename-1111.
> > > > > > > I am not sure if this is the proper terminology, but the issue
> appears
> > > to
> > > > be
> > > > > > > the negative dentry cache.
> > > > > > >
> > > > > > > > Ben.
> > > > > > > >
> > > > > > > > --
> > > > > > > > Ben Hutchings
> > > > > > > > Beware of bugs in the above code;
> > > > > > > > I have only proved it correct, not tried it. - Donald Knuth
> > > > > > >
> > > > > > > Jason Breitman
> > > > > > Jason Breitman
> > > > > Jason Breitman
> > > > Jason Breitman
> > > Jason Breitman
> > Jason Breitman
> Jason Breitman
Jason Breitman

Reply to: