[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reading an empty directory after reboot is very slow



Quoting Vincent Lefevre (vincent@vinc17.net):
> On 2015-04-23 11:22:11 -0500, David Wright wrote:
> > Quoting Vincent Lefevre (vincent@vinc17.net):
> > > On 2015-04-23 11:31:27 +0200, Nicolas George wrote:
> > > > Le quartidi 4 floréal, an CCXXIII, Vincent Lefevre a écrit :
> > > > > David's test shows that the renamed file is missed.
> > > > 
> > > > No, it shows that the renamed file is NOT missed: he renamed the
> > > > entry for inode 497003 from file2 into a long name, and that
> > > > entry is returned, exactly once, under its old name. The "oops,
> > > > file2 stat() fails" only shows the race condition between
> > > > readdir() return and stat(); I am sure that if he printed
> > > > dirent.d_ino instead of stat.st_ino, it would have printed
> > > > 497003.
> > 
> > Yes, it would. See below as to why I printed the stat() version.

Well, that's what I thought, because of the cacheing. But Nicolas
asked me to try using thousands of files, and so here we are,
ie, your "new test":

~ $ for j in `seq 10000` ; do mkdir /tmp/testdir/file$j ; done
~ $ ./a.out /tmp/testdir/ > lsout1
~ $ ./a.out /tmp/testdir/ > lsout2 ← here I renamed file2621
~ $ ./a.out /tmp/testdir/ > lsout3
~ $ wc lsout*
 10003  40009 237834 lsout1
 10002  40005 237809 lsout2 ← missing entry
 10003  40009 237890 lsout3
 30008 120023 713533 total
~ $ 

~ $ cat lsout1
1 I: 488780 .
2 I: 488641 ..
3 I: 872200 file1133
4 I: 920449 file5197
...
3262 I: 880852 file1940
3263 I: 896998 file3481
3264 I: 961277 file9216
3265 I: 945402 file7637
3266 I: 904859 file4197
3267 I: 879726 file1241
3268 I: 880694 file1782
3269 I: 945545 file7773
3270 I: 937554 file7267
sleeping
3271 I: 921316 file5736
3272 I: 945551 file7779
...
9998 I: 872083 file1016
9999 I: 937210 file6923
10000 I: 912739 file4685
10001 I: 888626 file2621
10002 I: 896549 file3032
~ $ 

~ $ cat lsout2
1 I: 488780 .
2 I: 488641 ..
3 I: 872200 file1133
4 I: 920449 file5197
...
3262 I: 880852 file1940
3263 I: 896998 file3481
3264 I: 961277 file9216
3265 I: 945402 file7637
3266 I: 904859 file4197
3267 I: 879726 file1241
3268 I: 880694 file1782
3269 I: 945545 file7773
3270 I: 937554 file7267
sleeping
3271 I: 921316 file5736
3272 I: 945551 file7779
...
9998 I: 872083 file1016
9999 I: 937210 file6923
10000 I: 912739 file4685
10001 I: 896549 file3032
~ $ 

~ $ cat lsout3
1 I: 488780 .
2 I: 488641 ..
3 I: 872200 file1133
4 I: 920449 file5197
...
3262 I: 880852 file1940
3263 I: 896998 file3481
3264 I: 888626 file2621file2621file2621file2621file2621file2621file2621file2621
3265 I: 961277 file9216
3266 I: 945402 file7637
3267 I: 904859 file4197
3268 I: 879726 file1241
3269 I: 880694 file1782
3270 I: 945545 file7773
sleeping
3271 I: 937554 file7267
3272 I: 921316 file5736
...
9998 I: 953898 file8808
9999 I: 872083 file1016
10000 I: 937210 file6923
10001 I: 912739 file4685
10002 I: 896549 file3032
~ $ 

So we have a file gone AWOL because it was renamed during this
program's execution. (The I numbers here come from dirent;
there's no call of stat.)

> > > I only focused on the inode number and I thought that David's test was
> > > printing dirent.d_ino (which was the obvious thing to do). Now, after
> > > re-reading carefully what he said:
> > > 
> > >   "I scan the directory with readdir, then stat the file to obtain its
> > >                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >    inode number."
> > >    ^^^^^^^^^^^^
> > > 
> > > Indeed, this is the wrong way to do!
> > 
> > Well, the program I filched, because filch it I did, printed more
> > *interesting* stuff about the file, stuff like the size etc.
> > I switched it to printing the (to most people, boring) inode number
> > because, *for the purpose of this exercise*, the inode number acts as
> > a unique identifier for a file which has just changed name.
> 
> Except that you don't show anything special: it is obvious that there
> is a race condition on the filename when doing a file rename and
> reading the directory containing that file at the same time, i.e. one
> can either get the old filename or the new filename. This will be true
> with *any* filesystem (I mean, the behavior is not specified).

That's what I originally said. Here it is again, from your posting at
https://lists.debian.org/debian-user/2015/04/msg01157.html

    On 2015-04-15 14:13:43 -0500, David Wright wrote:
    > Quoting Kushal Kumaran (kushal@locationd.net):
    > > Moving entries around breaks ongoing readdir operations.  If a readdir
    > > has gone past the file being removed, and you moved the last entry
    > > there, the entry being moved would be missed, despite *it* not being the
    > > entry added or removed.
    > 
    > I don't think this matters. There's no guarantee that another process     ← me
    > isn't writing to that directory while you are working your way along
    > the entries.
    
    But with the current solution (no automatic moving of an entry), you        ← you
    can't miss an entry that hasn't been removed.

You cut the context of Kushal's remark. Bob wondered why, when an
entry in a simple directory is deleted, it is zeroed out rather than
taking the last entry and writing it in place of the deleted entry
(assuming it will fit; it would in DOS FAT with fixed-length names,
but might not if names varied in length). "Bob's move" would have the
effect of keeping the directory small under favourable circumstances,
and it was directory size/shrinkage that was the original concern of
this thread.

Kushal then said this would break ongoing readdir operations.

I said it doesn't matter, they're broken already just by having
multiple processes which could also be using the directory.

You said that without "Bob's move", you couldn't miss an entry that
hasn't been removed.

I've demonstrated that you can.

> > > So, there's a need for a new test.
> > 
> > Feel free to write it.
> 
> No need, you have agreed above that the dirent.d_ino is correct
> (except in case of programming error).

My agreement was based on the small test case. (I admit I should have
printed the dirent.d_ino and not stat.st_ino to see whether readdir
fails to see the entry.) But *my agreement* is not a good way of
judging whether something is right or wrong!

One last observation. How come . and .. now appear first and second,
whereas they were scattered around in the small test cases? That
surprised me.

Cheers,
David.


Reply to: