[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

toppological sorting and datastructures



Hi,

Algorithms:
	I have been reading up on directed acyclic graphs and
 topological sorting. There are two methods to do the sort that I can
 see 
 a) Do a depth first search, printing out the node name after
    traversing the children (so each node occurs after all the
    dependents are listed; and reverse the order when printing)
  Algorithms [Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest]
  Algorithms in C [Robert Sedgewick]
   
 b) Look att all node with no dependencies, list them in any order,
    for each node listed, look for dependent packages and decrement the
    dependency count by one. repeat for all 0 order nodes after
    decrementing. 
  Algorithms [Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest]
  Discrete Mathematics and its applications [Kenneth H. Rosen]

	Both algorithms can work in O(V+E), where V is the number of
 nodes (packages), and E is the number of edges (dependencies). (I'm
 still struggling with an exercise in Algorithms [Thomas H. Cormen,
 Charles E. Leiserson, Ronald L. Rivest] which suggests that one may
 determine cycles in an undirected graph using an O(V) algorithm).

	Both these algorithms shall have to deal woth cycles, possibly
 by cutting links such that the cycle disappears. (In the forst
 algorithm, refuse to redo a node twice (color the node as needed), in
 the second node remove a dependency and decrement the indegree of the
 lowest degree's node, until it reaches 0 and one can try again.

	I'd like to also special case strongly connected areas of the
 graphs, since it is quite likey that the only cycles we may ever see
 may occur in strongly connected segments (package A depends on
 package B; package B also depends on package A).

	Most algorithmic methods for determining strongly connected
 regions of a graph are based on a depth first search, hence my
 predeliction for using this one algorithm, even though one gets less
 choices about the edge one snips to resolve cycles. I hope that if we
 coalesce strongly connected regions and install all those packages in
 one run of dpkg, we won't have that many cycles to contend with.

Representation: 
	Since the number of dependencies for any package are far less
 than the total number of packages, an adjacency list would be far
 more efficient way of representing the graphh. This is already very
 close to the data structure we use. 

	If we also construct the reverse depndency list in the backing
 store (cached list), then determining strongly connected regions
 becomes easier, since the reverse dependencies form a transposed
 directed graph of the original graph. Or else we may use tarjans
 variation of the depth first search to find strongly connected
 regions. 

	For dependency checking, we would need to have lists of
 packages installed, packages available, and packages which are
 candidates for installation.

	We would need to have ways of determining is a package is a
 member of a list, and cycle over all packages in a list,
 efficiently. 

	My initial proposal is to create a data structure that
 contains pointers to the cached data (backing store, I mean), where
 lookups and traversals are cheap. My candidate for this is a
 Red-Black tree.

	This is a far larger post than I origianlly intended.

	Any suggestions?

	manoj
 ps: Jason, what's the word on threads? I have written a mutex class, a
 conditional variable class, a write-once-read-many class, which I can
 maybe rewrite under the GPL or something. I'd like to know if I need
 to plan to include locks into my red-black tree.
-- 
 By definition, when you are investigating the unknown, you do not
 know what you will find.
Manoj Srivastava  <srivasta@acm.org> <http://www.datasync.com/%7Esrivasta/>
Key C7261095 fingerprint = CB D9 F4 12 68 07 E4 05  CC 2D 27 12 1D F5 E8 6E


Reply to: