[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#599823: [Fwd: Re: Bug#599823: linux-2.6: XEN and NFS causes duplicate filenames with large directories]



-------- Forwarded Message --------
From: Jason Kendall <jakendall@gmail.com>
To: Ben Hutchings <ben@decadent.org.uk>
Subject: Re: Bug#599823: linux-2.6: XEN and NFS causes duplicate filenames with large directories
Date: Mon, 11 Oct 2010 14:50:39 -0400


On 10-10-11 01:19 PM, Ben Hutchings wrote:
> On Mon, Oct 11, 2010 at 12:49:33PM -0400, Jason Kendall wrote:
>    
>> Package: linux-2.6
>> Severity: important
>> Tags: upstream
>>      
> Which version?
>    
uname was further in the report (i used reportbug so It should have been 
there. At the time of report it was 2.6.32-5-686-bigmem.
>> 2. Duplicate filenames are given when doing an "ls"
>> 3. Trigger happens when a rename (mv) happens on a directory with a large number of files.
>> 4. Does not matter which machine does the rename/mv (Any box connected to the NFS) the duplicate filenames still show up under DomU
>> 5. Does not appear to happen to directories with a limited number of files. I have one directory with>  9k files which this does happen on (mail directory)
>>      
> This is probably an effect of the NFS block size - any directory smaller
> than a single block is likely to be readable atomically.
>
>    
Upped the block size and same issue. 
(rw,rsize=32768,wsize=32768,hard,fg,nolock,nfsvers=3,tcp,actimeo=0,addr=10.0.0.7).

Prior, it was just mounted with defaults

>> 6. To clear the issue, you have to either rename the file back to the original, or reboot the DomU
>>      
> This last point is the troubling one.  If this condition was transient I
> would be tempted to say it's not a bug.  It sounds like the client treats
> its version of the directory as being correct as of the time the directory
> listing was completed, whereas it should either (1) treat the listing as
> correct at the time the directory listing started, therefore stale when it
> the directory is next read; or (2) detect that the directory changed and so
> discard the listing from its cache immediately.
>
>    
>> A little direction on how to continue diagnosing this issue, or a fix :) would be good.
>>      
>
> Please test Linux 2.6.36-rc6 as packaged in experimental.
>
>
>    
Just tested, same issue.

Looking at a pcap, NFSClient doesn't appear to be asking the server for 
the filenames, however, there is a large number of "ACCESS" and 
"GETATTR" requests.  Most are returned as "Directory", a few are 
returned as "Regular File". Of the Regular files, there is 3 returned, 
all the same file handle, and appear to be the same stats. There is 
matching GETATTR calls prior to each Regular File Reply, and a number of 
requests in between each one.

touching the file to update the mtime does not resolve the issue.

I can't seem to find a way to force a NFS cache flush.

For the record:

root@mx2:/home/jakendall/Maildir# ls cur/127840* -l
-rw------- 1 jakendall users 79124 Jul  6 04:10 
cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
-rw------- 1 jakendall users 79124 Jul  6 04:10 
cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
-rw------- 1 jakendall users 79124 Jul  6 04:10 
cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa


Doing a umount / mount on the drive doesn't clear it out either:

root@mx2:/home/jakendall/Maildir# cd /
root@mx2:/# umount /home
root@mx2:/# mount /home
root@mx2:/# cd /home/jakendall/Maildir/
root@mx2:/home/jakendall/Maildir# cd /home/jakendall/Maildir/
root@mx2:/home/jakendall/Maildir# ls cur/127840* -l
-rw------- 1 jakendall users 79124 Oct 11 14:33 
cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
-rw------- 1 jakendall users 79124 Oct 11 14:33 
cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
-rw------- 1 jakendall users 79124 Oct 11 14:33 
cur/1278403851.H569630P9192.mx1.ostlabs.com:2,Sa
root@mx2:/home/jakendall/Maildir#

The PCAP after this shows the READDIR, for the file, and the file shows 
up in 3 different Call/Reply once each.

Regards,
Jason





Reply to: