[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reading an empty directory after reboot is very slow



Quoting Vincent Lefevre (vincent@vinc17.net):
> On 2015-04-21 11:05:58 -0500, David Wright wrote:
> > Quoting Vincent Lefevre (vincent@vinc17.net):
> > > On 2015-04-20 13:04:41 -0500, David Wright wrote:
> > > > Quoting Vincent Lefevre (vincent@vinc17.net):
> > > > > But with the current solution (no automatic moving of an entry), you
> > > > > can't miss an entry that hasn't been removed.
> > [...]
> > > > ...so if you happen to be reading the entry for file5 at the
> > > > time I typed mv, you'll get the entry for file4 twice, under
> > > > two different names. (Or the opposite.)
> > > 
> > > OK, so, if the rename(2) system call can reorder the entries (this is
> > > not quite clear because one doesn't see the empty entries here),
> > 
> > No, and you wouldn't *normally* see them with readdir, I'd suppose.
> 
> But this matters for the implementation in the kernel.

What's "this". And what does it matter? You make some system calls,
and you get replies. They come out of a black box.

> > > then
> > > there is already a problem with the file system. Getting an entry
> > > twice under different names is not much a problem, IMHO, because one
> > > can look at the inode number; there's a race condition, but at worst,
> > > one can just miss a *new* inode (whose number has been reassigned).
> > > Missing an existing entry is a problem.
> > 
> > ...easily demonstrated with

[old demonstration snipped]
> > 
> > where file3 goes AWOL.
> 
> You haven't demonstrated anything. If you have before the mv:
> 
> 0: file1
> 1: file4
> 2: file5
> 3: file6
> 4: file2
> 5: file3
> 
> and after the mv:
> 
> 0: file1
> 1: file4
> 2: file5
> 3: [empty]
> 4: [empty]
> 5: file3file3file3file3file3file3file3file3file3file3file3file3file3
> 6: file6
> 7: file2
> 
> there's no problem.

So you don't believe the problem when it's demonstrated, but you do
believe some hypotheticals you just made up. Ask yourself why an
efficient filesystem would move a load of directory entries just
because someone renamed a file.

> What actually needs to be done is a real test
> using readdir.

OK. Here's a demonstration of a file going AWOL by moving *up* the
directory listing. Because of read-ahead, readdir still sees the old
name and the stat() fails. Again, because of read-ahead, I can't
demonstrate the opposite effect in the same program because
you'd have to have a directory bigger than the read-ahead buffer
in order to see any effect. But, as was said already, it's occurrence
can be discovered by checking the inode numbers for duplicate returns.

I scan the directory with readdir, then stat the file to obtain its
inode number. E is stat's return code, I is inode number.
When the latter matches 497051, I sleep for 5 seconds so that
another process can rename a file.

~ $ for j in 1 2 3 4 5 6 ; do mkdir /tmp/testdir/file$j ; done

~ $ /tmp/a.out /tmp/testdir/ ← before doing anything
1 E: 0 I: 496992 file1
2 E: 0 I: 497007 file4
3 E: 0 I: 497039 file5
4 E: 0 I: 488682 .
5 E: 0 I: 497051 file6
sleeping ← I give myself 5 seconds to do something
6 E: 0 I: 488641 ..
7 E: 0 I: 497003 file2
8 E: 0 I: 497006 file3

~ $ /tmp/a.out /tmp/testdir/ ← during the alteration
1 E: 0 I: 496992 file1
2 E: 0 I: 497007 file4
3 E: 0 I: 497039 file5
4 E: 0 I: 488682 .
5 E: 0 I: 497051 file6
sleeping                ← here I renamed file2 (in another xterm)
6 E: 0 I: 488641 ..
7 E: -1 I: 488641 file2 ← oops, file2 stat() fails (so the inode number is untouched from the previous call)
8 E: 0 I: 497006 file3

~ $ /tmp/a.out /tmp/testdir/ ← after the alteration
1 E: 0 I: 496992 file1
2 E: 0 I: 497007 file4
3 E: 0 I: 497039 file5
4 E: 0 I: 488682 .
5 E: 0 I: 497003 file2file2file2file2file2file2file2file2file2file2file2file2file2file2 ← here it is
6 E: 0 I: 497051 file6
sleeping
7 E: 0 I: 488641 ..
8 E: 0 I: 497006 file3
~ $ 

> Any idea of the algorithm to choose the directory entries? The fact
> that the files are not ordered initially is unintuitive.

A hashing function, so I guess one reads that as "random".
Oh, oh, I better be careful what I say. "Pseudorandom", as it's
deterministic. I get the same sequence every time I make those
files.

> > > What do the backup systems do?
> > 
> > I don't know. Lock the directory and slurp it (if it's not too big),
> > otherwise check the modification time before and after reading it,
> > and reread it if necessary, maybe...
> 
> The modification time mustn't be used since it can be changed by some
> tools, e.g. during unarchiving or decompressing. The ctime could be
> OK, but re-reads can introduce endless loops for directories that are
> constantly modified.

What's going on here? Is "What do the backup systems do?" an Aunt Sally,
so you can take pot shots at any suggestion? If you want to know what
backup systems do, how filesystems "choose the directory entries",
then you've got the same access to the source as I have. Take a look.
But don't expect me to come up with a bullet-proof scheme.

This subthread started at https://lists.debian.org/debian-user/2015/04/msg01157.html
with your statement "But with the current solution (no automatic
moving of an entry), you can't miss an entry that hasn't been removed."

I disagreed, giving evidence. Take it or leave it. Why should I care?

Cheers,
David.


Reply to: