[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reading an empty directory after reboot is very slow

David Wright wrote:
> Quoting Kushal Kumaran:
> > Bob Proulx writes:
> > > If a directory became full it was easy to extend it
> > > by writing the array longer.  But if an early entry in the array was
> > > deleted the system would zero it out rather than move each and every
> > > entry in the file system down a slot.  (I always wondered why they
> > > didn't simply take the *last* entry and move it down to the deleted
> > > entry and simply keep the array always compacted.  I wonder.  But they
> > > didn't do it that way.)
> I think the reason for this is that the entries have different lengths
> corresponding to the filenamelength, so you'd have to search for a slot
> small enough. Were this slot not the last entry, then keep repeating...
> I think I can see a pathological end-case here.

But for many years filename slots in directories were of fixed sizes
and yet they still did not move entries around.  They just renamed
them in place.  There wouldn't have been a pathological case in the
old fixed length slots.  I have forgotten how the longer BSD namestyle
slots work.  It might be pathological there.  But those came later.
However that might contribute weight into why it isn't typically done.

> Once you have trees, I'm out of my depth. But I read that trees have
> to be balanced, which may mean a whole new set of algorithms for
> pruning them.

I have worked extensively with trees in memory.  Not so much with
B-trees on disk.  But in both cases there is a big difference between
insertion and deletion.  I should go back and refresh my memory on the
code but I am going to take a risk and just blurt this out anyway.
Rebalancing trees after insertion is always a relatively localized
operation.  AVL trees do it by brute force and there are various
techniques such as 2-3-4 trees and Red-Black that are clever
alternative models.  Basically you prevent the tree from being
pathologically unbalanced and can then always tolerate a little
imbalance so the rotations needed are smaller.

Rebalancing trees after deletion on the other hand is completely
different from rebalancing after insertion.  A deletion may cause a
large rebalancing of a large portion of the tree.  In memory it
probably won't ever get really bad.  Also deletions tend to be less
frequent than insertions for a program using an in memory data
structure.  But for a large B-tree on disk (which I haven't
implemented myself yet) I could imagine it requiring a significant
number of disk operations.  For a directory the number of deletions is
on average the same as insertions.  Deletions are just as common as
insertions.  Marking them empty is likely a significant savings.  It
is only the "industrial strength" file systems such as xfs (and
probably jfs and others) that go the extra code to clean up and
actually delete the entries.

> which may mean a whole new set of algorithms for pruning them.

I wanted to emphasize this.  Yes.  Insertion is easier than deletion.
Deletion requires a significant amount of additional code to handle
properly.  In text books deletion is often left as an exercise to the
reader.  Why?  Because its much harder and less often required.  The
author won't have wanted to delve that deeply into that case.

And as additional commentary when it is in memory for a program if
there is a bug in deletion you can restart the program.  When it is a
file system structure then you would probably lose file data and
require an extensive fsck to repair.  That is much more painful.  I
don't fault file system authors for not implementing deletion to the
same level since it is riskier during development until you get to
that "bug free state".  (Does any software ever truly get there?)

> > Moving entries around breaks ongoing readdir operations.  If a readdir
> > has gone past the file being removed, and you moved the last entry
> > there, the entry being moved would be missed, despite *it* not being the
> > entry added or removed.

I think Kushal is exactly correct.  I had wondered about it but after
the above reasoning I think that completely explains it.  Thank you
Kushal for this rationale.

> I don't think this matters. There's no guarantee that another process
> isn't writing to that directory while you are working your way along
> the entries.

True.  If this is part of a process and becomes a critical section
then the entire process should semaphore access to it.  A typical Unix
way would be to write a daemon to access it and all other processes go
through that daemon which has the same effect of a semaphore simply
because there is now only one process accessing the resource.  This is
effectively the way databases operate.

> This whole discussion touches on one of the facts of life: people
> generally design things for extending, not for contracting. Ability to
> extend a design is an important criterion in its success. In the field
> of computers this is often coupled with backwards compatibility, so you
> can keep the old design going.

Agreed.  I think this comment is very insightful.

Thanks for all of the awesome test cases and data.  Really good stuff!


Attachment: signature.asc
Description: Digital signature

Reply to: