toppological sorting and datastructures
Hi,
Algorithms:
I have been reading up on directed acyclic graphs and
topological sorting. There are two methods to do the sort that I can
see
a) Do a depth first search, printing out the node name after
traversing the children (so each node occurs after all the
dependents are listed; and reverse the order when printing)
Algorithms [Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest]
Algorithms in C [Robert Sedgewick]
b) Look att all node with no dependencies, list them in any order,
for each node listed, look for dependent packages and decrement the
dependency count by one. repeat for all 0 order nodes after
decrementing.
Algorithms [Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest]
Discrete Mathematics and its applications [Kenneth H. Rosen]
Both algorithms can work in O(V+E), where V is the number of
nodes (packages), and E is the number of edges (dependencies). (I'm
still struggling with an exercise in Algorithms [Thomas H. Cormen,
Charles E. Leiserson, Ronald L. Rivest] which suggests that one may
determine cycles in an undirected graph using an O(V) algorithm).
Both these algorithms shall have to deal woth cycles, possibly
by cutting links such that the cycle disappears. (In the forst
algorithm, refuse to redo a node twice (color the node as needed), in
the second node remove a dependency and decrement the indegree of the
lowest degree's node, until it reaches 0 and one can try again.
I'd like to also special case strongly connected areas of the
graphs, since it is quite likey that the only cycles we may ever see
may occur in strongly connected segments (package A depends on
package B; package B also depends on package A).
Most algorithmic methods for determining strongly connected
regions of a graph are based on a depth first search, hence my
predeliction for using this one algorithm, even though one gets less
choices about the edge one snips to resolve cycles. I hope that if we
coalesce strongly connected regions and install all those packages in
one run of dpkg, we won't have that many cycles to contend with.
Representation:
Since the number of dependencies for any package are far less
than the total number of packages, an adjacency list would be far
more efficient way of representing the graphh. This is already very
close to the data structure we use.
If we also construct the reverse depndency list in the backing
store (cached list), then determining strongly connected regions
becomes easier, since the reverse dependencies form a transposed
directed graph of the original graph. Or else we may use tarjans
variation of the depth first search to find strongly connected
regions.
For dependency checking, we would need to have lists of
packages installed, packages available, and packages which are
candidates for installation.
We would need to have ways of determining is a package is a
member of a list, and cycle over all packages in a list,
efficiently.
My initial proposal is to create a data structure that
contains pointers to the cached data (backing store, I mean), where
lookups and traversals are cheap. My candidate for this is a
Red-Black tree.
This is a far larger post than I origianlly intended.
Any suggestions?
manoj
ps: Jason, what's the word on threads? I have written a mutex class, a
conditional variable class, a write-once-read-many class, which I can
maybe rewrite under the GPL or something. I'd like to know if I need
to plan to include locks into my red-black tree.
--
By definition, when you are investigating the unknown, you do not
know what you will find.
Manoj Srivastava <srivasta@acm.org> <http://www.datasync.com/%7Esrivasta/>
Key C7261095 fingerprint = CB D9 F4 12 68 07 E4 05 CC 2D 27 12 1D F5 E8 6E
Reply to: