[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1017720: nfs-common: No such file or directory



The issue also occurs when using the lookupcache=none option along with the 5.10.X kernel.
I was hoping for this option to succeed and to investigate the performance impact, but it is no longer viable.
I believe that I am out of options to try with the 5.10.X kernel.
Please let me know where we stand.

> -----Original Message-----
> From: Jason Breitman
> Sent: Wednesday, September 21, 2022 1:01 PM
> To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I now know that this behavior does exist in Debian Buster 10.8 and more
> specifically in the 4.19.X kernel after running stricter testing on more servers.
> The 4.19.X kernel resolves itself immediately following the No such file or
> directory error which is different than the 5.X kernel requiring me to clear the
> inode and dentry cache by running echo 2 > /proc/sys/vm/drop_caches.
> What further information is required to resolve this issue?
> 
> > -----Original Message-----
> > From: Jason Breitman
> > Sent: Tuesday, September 13, 2022 4:41 PM
> > To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I downgraded the nfs-common package which required the downgrade of
> > the libevent packages and am using the 4.19.X kernel.
> > I see the issue running the initial test, but then the issue is gone when
> > running the test a subsequent time.
> >
> > libevent-2.1-6:amd64                      2.1.8-stable-4                        amd64
> > Asynchronous event notification library
> > libevent-core-2.1-6:amd64             2.1.8-stable-4                        amd64
> > Asynchronous event notification library (core)
> > libevent-pthreads-2.1-6:amd64     2.1.8-stable-4                        amd64
> > Asynchronous event notification library (pthreads)
> > linux-image-4.19.0-21-amd64        4.19.249-2                              amd64        Linux
> > 4.19 for 64-bit PCs (signed)
> > nfs-common                                      1:1.3.4-2.5+deb10u1            amd64        NFS
> > support files common to client and server
> >
> > What other packages do I need to downgrade in order to get Debian 11.4 to
> > behave like Debian 10.8?
> > What additional questions can I answer so that we can move forward?
> >
> > > -----Original Message-----
> > > From: Jason Breitman
> > > Sent: Tuesday, September 6, 2022 5:18 PM
> > > To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > I also see the failure with the kernels below, but the 4.19.X kernel
> resolves
> > > the issue without dropping caches.
> > > linux-image-4.19.0-14-amd64       4.19.171-2                     amd64        Linux 4.19
> > for
> > > 64-bit PCs (signed)
> > > linux-image-4.19.0-21-amd64       4.19.249-2                     amd64        Linux 4.19
> > for
> > > 64-bit PCs (signed)
> > >
> > > I see the issue running the initial test, but then the issue is gone when
> > > running the test a subsequent time.
> > > I ran several tests to verify the behavior differences between the 4.19.X
> > and
> > > 5.X kernels.
> > >
> > > -- Test
> > >     ls -l /mnt/dir/someOtherDir/* | grep '?'
> > >
> > > -- Error message - the error message is showing files that have been
> erased
> > > via rsync --delete
> > >     ls: cannot access 'filename': No such file or directory
> > >     -????????? ? ?    ?            ?            ? filename
> > >
> > > > -----Original Message-----
> > > > From: Jason Breitman
> > > > Sent: Friday, September 2, 2022 5:17 PM
> > > > To: Ben Hutchings <ben@decadent.org.uk>; 1017720@bugs.debian.org
> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > I have tested with the following kernels and see this issue in each case.
> > > >
> > > > linux-image-5.10.0-16-amd64              5.10.127-1                              amd64
> > > Linux
> > > > 5.10 for 64-bit PCs (signed)
> > > > linux-image-5.15.0-0.bpo.3-amd64     5.15.15-2~bpo11+1              amd64
> > > > Linux 5.15 for 64-bit PCs (signed)
> > > > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1              amd64
> > > > Linux 5.18 for 64-bit PCs (signed)
> > > >
> > > > An interesting note is that when using the 5.18 kernel, I had to run echo
> 3
> > >
> > > > /proc/sys/vm/drop_caches to resolve the issue.
> > > > echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10
> and
> > > > 5.15 kernels.
> > > >
> > > > > -----Original Message-----
> > > > > From: Jason Breitman
> > > > > Sent: Friday, August 26, 2022 3:36 PM
> > > > > To: 'Ben Hutchings' <ben@decadent.org.uk>;
> > '1017720@bugs.debian.org'
> > > > > <1017720@bugs.debian.org>
> > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > > >
> > > > > I was able to identify another workaround today which may help you
> to
> > > > > identify the issue.
> > > > > The workaround is to touch the directory where the troubled files live
> > on
> > > > the
> > > > > file server.
> > > > > I believe this tells us that updating the modify time attribute is used
> by
> > > the
> > > > > cache.
> > > > > It should be noted that access time updates are disabled on the file
> > > server.
> > > > >
> > > > > I also wanted to restate that we use rsync to push out these
> application
> > > > > updates and also use rsync to sync data files.
> > > > > Our rsync options preserve timestamps, so it is possible that the new
> > files
> > > > > have an older timestamp than "now".
> > > > > It is not the case that the new files have an older timestamp than the
> > > prior
> > > > > version that is stuck in the cache.
> > > > >
> > > > > The rsync process that I describe has not changed and has been in use
> > for
> > > > > many years.
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jason Breitman
> > > > > > Sent: Thursday, August 25, 2022 11:54 AM
> > > > > > To: Ben Hutchings <ben@decadent.org.uk>;
> > 1017720@bugs.debian.org
> > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > > > >
> > > > > > I have the same issue after adding actimeo=30 to /etc/fstab,
> > rebooting
> > > > and
> > > > > > testing.
> > > > > > I also confirmed that those settings applied via /proc/mounts which
> > > > shows
> > > > > > the below snippet for each mountpoint.
> > > > > > nfs4
> > > > > >
> > > > >
> > > >
> > >
> >
> rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a
> > > > > >
> > > > >
> > > >
> > >
> >
> cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s
> > > > > >
> > > > >
> > > >
> > >
> >
> ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0
> > > > > > 0
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jason Breitman
> > > > > > > Sent: Tuesday, August 23, 2022 2:42 PM
> > > > > > > To: Ben Hutchings <ben@decadent.org.uk>;
> > > 1017720@bugs.debian.org
> > > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > > > > >
> > > > > > > What additional information can I provide for us to move forward
> > with
> > > > > this
> > > > > > > process?
> > > > > > >
> > > > > > > To summarize and include further details, rsync is used to sync
> > > > > applications
> > > > > > to
> > > > > > > a file server which behaves like a repository.
> > > > > > > We do preserve timestamps from the build server and also use --
> > > > delete.
> > > > > > We
> > > > > > > do not run the applications from the file server.  All servers use
> NTP.
> > > > > > >
> > > > > > > The application has a sub-directory that contain files with version
> > > > > numbers.
> > > > > > > These are libraries.
> > > > > > > When a new build is complete, a developer pushes their updates
> via
> > > > > rsync
> > > > > > to
> > > > > > > the file server / repository.
> > > > > > >
> > > > > > > I believe that the dentry cache thinks the "old" files exist and
> > > generates
> > > > a
> > > > > > No
> > > > > > > such file or directory error showing question marks for that files
> > > > > attributes.
> > > > > > > Dropping the dentry cache via echo 2 >
> /proc/sys/vm/drop_caches
> > > > > > resolves
> > > > > > > the issue.
> > > > > > >
> > > > > > > This behavior is not observed in Debian 10.8 with that distributions
> > > > > > associated
> > > > > > > kernel and packages.
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Jason Breitman
> > > > > > > > Sent: Friday, August 19, 2022 9:52 PM
> > > > > > > > To: Ben Hutchings <ben@decadent.org.uk>;
> > > > 1017720@bugs.debian.org
> > > > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Ben Hutchings <ben@decadent.org.uk>
> > > > > > > > > Sent: Friday, August 19, 2022 7:27 PM
> > > > > > > > > To: Jason Breitman <jbreitman@tildenparkcapital.com>;
> > > > > > > > > 1017720@bugs.debian.org
> > > > > > > > > Subject: Re: Bug#1017720: nfs-common: No such file or
> directory
> > > > > > > > >
> > > > > > > > > Control: tag -1 moreinfo
> > > > > > > > >
> > > > > > > > > On Fri, 2022-08-19 at 13:16 +0000, Jason Breitman wrote:
> > > > > > > > > > Package: nfs-common
> > > > > > > > > > Version: 1:1.3.4-6
> > > > > > > > > > Severity: important
> > > > > > > > > >
> > > > > > > > > > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-
> > 30)
> > > > > > x86_64
> > > > > > > > > > GNU/Linux
> > > > > > > > > >
> > > > > > > > > > -- Description
> > > > > > > > > >     After updating and or creating new files on our file server
> via
> > > > > > > > > > rsync, we see many files report the error message below
> from
> > > > > NFSv4
> > > > > > > > > > clients since upgrading from Debian 10.8 to Debian 11.4.
> > > > > > > > > >     Clearing the dentry cache resolves the issue right away.
> > > > > > > > > >     I am not sure that nfs-common is the package to blame,
> but
> > > > listed
> > > > > > > > > > it based on the bug submission recommendations.
> > > > > > > > >
> > > > > > > > > The NFS implementation is mostly in the kernel, so probably
> this
> > > > issue
> > > > > > > > > belongs there.  But the kernel team is responsible for both
> > > > packages.
> > > > > > > > >
> > > > > > > > > [...]
> > > > > > > > > > -- Error message
> > > > > > > > > >     ls: cannot access 'filename': No such file or directory
> > > > > > > > > >     -????????? ? ?    ?            ?            ? filename
> > > > > > > > > [...]
> > > > > > > > >
> > > > > > > > > So we know the file's there but can't stat it.  I think this means
> > the
> > > > > > > > > client has cached the handle of the old file of that name, which
> > > has
> > > > > > > > > been deleted.
> > > > > > > > >
> > > > > > > > > - Are client and server clocks closely synchronised?  If not, that
> > > > > > > > > needs to be fixed.
> > > > > > > > >
> > > > > > > > The clocks are synchronized using NTP.
> > > > > > > >
> > > > > > > > > - Are clients likely to read this directory while rsync is running,
> or
> > > > > > > > > shortly before?  If so, it may help to reduce the attribute
> caching
> > > > > > > > > timeout on the client.  See the "Directory entry caching"
> section
> > in
> > > > > > > > > the nfs(5) manual page.
> > > > > > > > >
> > > > > > > > Clients are not likely to read this directory while rsync is running
> > for
> > > > the
> > > > > > > > observed cases.  That can happen in our environment, but not in
> > > this
> > > > > > case.
> > > > > > > > I am using the lookupcache=pos option.  I tried noac, but the
> > > > > > performance
> > > > > > > > penalty was too much.  Which option are you referring to and
> > what
> > > > > > setting
> > > > > > > > do you recommend testing?
> > > > > > > >
> > > > > > > > > I don't know why you're only seeing this after an upgrade of
> the
> > > > > > > > > clients, though.  I'm not aware that there has been any big
> > change
> > > > to
> > > > > > > > > attribute caching.
> > > > > > > > >
> > > > > > > > I appreciate you responding to my report and am happy to
> answer
> > > > any
> > > > > > > > questions.
> > > > > > > > We have multiple monitors and log scrapers to detect "file not
> > > found"
> > > > > > > > exceptions that would let us know if this was happening before.
> > > > > > > > To share more, I have 2 environments mounting from the same
> > file
> > > > > > server.
> > > > > > > > Each environment has several servers.  The issue is only seen in
> > the
> > > > > > > > environment running Debian 11.4.
> > > > > > > > I also should have mentioned that the files in question have a
> > > version
> > > > > > > > number appended.  filename-1111.  When the file is updated via
> > > > rsync,
> > > > > it
> > > > > > is
> > > > > > > > called filename-1112 and the prior file is removed.  The error is
> > > about
> > > > > > > > filename-1111.
> > > > > > > > I am not sure if this is the proper terminology, but the issue
> > appears
> > > > to
> > > > > be
> > > > > > > > the negative dentry cache.
> > > > > > > >
> > > > > > > > > Ben.
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Ben Hutchings
> > > > > > > > > Beware of bugs in the above code;
> > > > > > > > > I have only proved it correct, not tried it. - Donald Knuth
> > > > > > > >
> > > > > > > > Jason Breitman
> > > > > > > Jason Breitman
> > > > > > Jason Breitman
> > > > > Jason Breitman
> > > > Jason Breitman
> > > Jason Breitman
> > Jason Breitman
> Jason Breitman
Jason Breitman

Reply to: