[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: toppological sorting and datastructures



On 13 Nov 1997, Manoj Srivastava wrote:

> Hi,
> 
> 	The data structures used for most graph algorithms is either:
>  a) Adjacency matrix. Suppose the graph has V nodes and E edges. The
>     adjacency matrix is a |V| x |V| array, with Ai,j = ! if an edge
>     exists from vertex i to vertex j (or, in other words, it has a |E|
>     non null elements). This representation is good for dense arrays
>     (lots of edges). In our case, Vertexes are packages, and
>     dependency relations are edges, and the matrix is quite sparse.

Hm that would be quite the structure. 1000x1000x1 -> 1 Meg

>  b) adjacency list: This is an array (or linked list) of length
>     |V|. Each node is the head of a linked list of dependencies. I
>     think this is already how the backing store is represented (except
>     that we have multiple kinds of dependencies, each of which gets a
>     linked list (I think)

Check the cache.sgml on the last point there, the depends get dropped into
a single big linked list. You might as well use all of the depends data in
the ordering, with varying degrees of strictness. A package that suggests
another package should 'gently try' to install itself first. Remember,
an important goal is to have predictability in the algorithm.

b) looks like fairly simple. First off, there is an important
distinction. You are not working with packages, you are working with
-versions-. 

So I can give you a list of versions to install, a list of versions that
are installed and a list of versions that could possibly be installed if
you really wanted to. Note that there can be multiple versions to be
possibly installed for each package and so on.

> 	The list of packages that have to be ordered are the
>  candidates for installation, so I shall need a list of candidates
>  separately from the list of all known packages. It would also be nice
>  to have a list of installed packages (and what the heck, a list of
>  packages to be removed)

Again replace the above with versions. It's easy to do, something like:
   Version **RemVer = new Version *[Cache.HeaderP->VersionCount+1]
   int J = 0;
   for (int I = 0; I != _count(Cache.HeaderP->HashTable); I++)
   {
      // Iterate over all the packages
      pkgCache::Package *P = Cache.PkgP + Cache.HeaderP->HashTable[I];
      for (; P != Cache.PkgP; P = Cache.PkgP + P->NextPackage)
         if (P->CurrentVer != 0 && P->SelectedState == pkgSTATE_Purge)
         {
            RemVer[J] = Cache.VerP + P->CurrentVer;
            J++;
         }   
   }
   RemVer[J] = 0;
 -- Taken from bits of pkgtreeitem.cc

Tada, list of versions to be removed. Similar iterations are used to
generate other forms of lists. I'm trying to come up with an iterator
scheme, but nothing so far. The crazy syntax is to support the readonly
memory map of the cache file. 

> 	What I need are some kind of a separate index/list. The
>  operations it wouls have to support are 
>  a) does package X belong to list?
>  b) for each package that belongs to list X do {}
> 	I have used an efficient RB-tree implementation to simulate a
>  set in the past. If there are indices in the cache generator that fit
>  the needs, then I'll gladly forego writing the tree ;-)

If you really need 'member of set' then this is a job for the extension
structures, it will be many times faster and easier to use that sort of
scheme then to use a RB tree. Also, depending on the nature of the data
you are deriving it may already be present in the cache, ie it is trivial
to tell if a version is marked for removal (see above). It is similary
trivial to tell if a version is marked for install or upgrade.

> 	It would be nice to be able to construct a list on the fly
>  (list of all packages which pre depend on others and thus mark the
>  end of a run for dpkg; pre-dependees and pre-dependent packages are
>  installed on separate runs of dpkg). I can use the same
>  infrastructure to build the final ordered list too.
> 
> 	I'd need the data structure that can be added to the backing
>  store (coloring nodes during an ordering run, for example).
> 
> 	manoj
>  ps: actually, package name, version, and optionally, source and
>  architecture determine a vertex, if one is to be pedantic.
> -- 
>  "Neurotic: Self-taut person." Author Unknown
> Manoj Srivastava  <srivasta@acm.org> <http://www.datasync.com/%7Esrivasta/>
> Key C7261095 fingerprint = CB D9 F4 12 68 07 E4 05  CC 2D 27 12 1D F5 E8 6E
> 
> 


Reply to: