[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#599823: Fwd: Bug#599823: linux-2.6: XEN and NFS causes duplicate filenames with large directories



Opps.. didn't reply all..

---------- Forwarded message ----------
From: Jason Kendall <jakendall@gmail.com>
Date: Mon, Oct 11, 2010 at 2:50 PM
Subject: Re: Bug#599823: linux-2.6: XEN and NFS causes duplicate filenames with large directories
To: Ben Hutchings <ben@decadent.org.uk>




On 10-10-11 01:19 PM, Ben Hutchings wrote:
On Mon, Oct 11, 2010 at 12:49:33PM -0400, Jason Kendall wrote:
 
Package: linux-2.6
Severity: important
Tags: upstream
   
Which version?
 
uname was further in the report (i used reportbug so It should have been there. At the time of report it was 2.6.32-5-686-bigmem.

2. Duplicate filenames are given when doing an "ls"
3. Trigger happens when a rename (mv) happens on a directory with a large number of files.
4. Does not matter which machine does the rename/mv (Any box connected to the NFS) the duplicate filenames still show up under DomU
5. Does not appear to happen to directories with a limited number of files. I have one directory with>  9k files which this does happen on (mail directory)
   
This is probably an effect of the NFS block size - any directory smaller
than a single block is likely to be readable atomically.

 
Upped the block size and same issue. (rw,rsize=32768,wsize=32768,hard,fg,nolock,nfsvers=3,tcp,actimeo=0,addr=10.0.0.7).

Prior, it was just mounted with defaults


6. To clear the issue, you have to either rename the file back to the original, or reboot the DomU
   
This last point is the troubling one.  If this condition was transient I
would be tempted to say it's not a bug.  It sounds like the client treats
its version of the directory as being correct as of the time the directory
listing was completed, whereas it should either (1) treat the listing as
correct at the time the directory listing started, therefore stale when it
the directory is next read; or (2) detect that the directory changed and so
discard the listing from its cache immediately.

 
A little direction on how to continue diagnosing this issue, or a fix :) would be good.
   

Please test Linux 2.6.36-rc6 as packaged in experimental.


 
Just tested, same issue.

Looking at a pcap, NFSClient doesn't appear to be asking the server for the filenames, however, there is a large number of "ACCESS" and "GETATTR" requests.  Most are returned as "Directory", a few are returned as "Regular File". Of the Regular files, there is 3 returned, all the same file handle, and appear to be the same stats. There is matching GETATTR calls prior to each Regular File Reply, and a number of requests in between each one.

touching the file to update the mtime does not resolve the issue.

I can't seem to find a way to force a NFS cache flush.

For the record:

root@mx2:/home/jakendall/Maildir# ls cur/127840* -l
-rw------- 1 jakendall users 79124 Jul  6 04:10 cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
-rw------- 1 jakendall users 79124 Jul  6 04:10 cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
-rw------- 1 jakendall users 79124 Jul  6 04:10 cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa


Doing a umount / mount on the drive doesn't clear it out either:

root@mx2:/home/jakendall/Maildir# cd /
root@mx2:/# umount /home
root@mx2:/# mount /home
root@mx2:/# cd /home/jakendall/Maildir/
root@mx2:/home/jakendall/Maildir# cd /home/jakendall/Maildir/
root@mx2:/home/jakendall/Maildir# ls cur/127840* -l
-rw------- 1 jakendall users 79124 Oct 11 14:33 cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
-rw------- 1 jakendall users 79124 Oct 11 14:33 cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
-rw------- 1 jakendall users 79124 Oct 11 14:33 cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
root@mx2:/home/jakendall/Maildir#

The PCAP after this shows the READDIR, for the file, and the file shows up in 3 different Call/Reply once each.

Regards,
Jason



Reply to: