[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1017720: nfs-common: No such file or directory



> -----Original Message-----
> From: Ben Hutchings <ben@decadent.org.uk>
> Sent: Friday, August 19, 2022 7:27 PM
> To: Jason Breitman <jbreitman@tildenparkcapital.com>;
> 1017720@bugs.debian.org
> Subject: Re: Bug#1017720: nfs-common: No such file or directory
> 
> Control: tag -1 moreinfo
> 
> On Fri, 2022-08-19 at 13:16 +0000, Jason Breitman wrote:
> > Package: nfs-common
> > Version: 1:1.3.4-6
> > Severity: important
> >
> > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64
> > GNU/Linux
> >
> > -- Description
> >     After updating and or creating new files on our file server via
> > rsync, we see many files report the error message below from NFSv4
> > clients since upgrading from Debian 10.8 to Debian 11.4.
> >     Clearing the dentry cache resolves the issue right away.
> >     I am not sure that nfs-common is the package to blame, but listed
> > it based on the bug submission recommendations.
> 
> The NFS implementation is mostly in the kernel, so probably this issue
> belongs there.  But the kernel team is responsible for both packages.
> 
> [...]
> > -- Error message
> >     ls: cannot access 'filename': No such file or directory
> >     -????????? ? ?    ?            ?            ? filename
> [...]
> 
> So we know the file's there but can't stat it.  I think this means the
> client has cached the handle of the old file of that name, which has
> been deleted.
> 
> - Are client and server clocks closely synchronised?  If not, that
> needs to be fixed.
> 
The clocks are synchronized using NTP.  

> - Are clients likely to read this directory while rsync is running, or
> shortly before?  If so, it may help to reduce the attribute caching
> timeout on the client.  See the "Directory entry caching" section in
> the nfs(5) manual page.
>
Clients are not likely to read this directory while rsync is running for the observed cases.  That can happen in our environment, but not in this case.
I am using the lookupcache=pos option.  I tried noac, but the performance penalty was too much.  Which option are you referring to and what setting do you recommend testing?

> I don't know why you're only seeing this after an upgrade of the
> clients, though.  I'm not aware that there has been any big change to
> attribute caching.
> 
I appreciate you responding to my report and am happy to answer any questions.
We have multiple monitors and log scrapers to detect "file not found" exceptions that would let us know if this was happening before.
To share more, I have 2 environments mounting from the same file server.  Each environment has several servers.  The issue is only seen in the environment running Debian 11.4.
I also should have mentioned that the files in question have a version number appended.  filename-1111.  When the file is updated via rsync, it is called filename-1112 and the prior file is removed.  The error is about filename-1111.
I am not sure if this is the proper terminology, but the issue appears to be the negative dentry cache.

> Ben.
> 
> --
> Ben Hutchings
> Beware of bugs in the above code;
> I have only proved it correct, not tried it. - Donald Knuth

Jason Breitman

Reply to: