[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Package Pool Proposal



On Mon, Nov 22, 1999 at 10:55:41PM -0800, Guy Maor wrote:
> Dependency information.  Yes, dependency information is clearly there,
> and is used as a rule on whether to delete something from the archive
> (no more binaries without source, etc.) and also on whether to include
> something in the distribution (no more missing dependencies).

Dependencies between source and binary packages, and between things like
netbase and libwrap0 are really different beasts. It seems a bit weird
combining them. Personally, I wouldn't even worry too much about having
the pkgpool db take care of the latter all, but rather leave that to
some of the distribution code.

> I'll start thinking about how the database needs to be structured to
> make the types of queries we need to do be cheap to do.  There needs
> to be some concept of package-sets: all the source and debs for a
> package.

Being able to go from binary to source and back again is helpful. Being
able to automatically generate Packages files is helpful (obviously).

[just after the dinstall run...] 
> We iterate through the distributions, pushing every package (or
> package-set?  probably mixing package-sets is bad)

At the moment, we mix package-sets fairly heavily. For example, the
current libreadlineg2 is based on bash 2.02.1-1.6, whereas the current
bash is -1.8. There are similar issues with old libstdc++ versions. This
is completely aside from any rebuilding lag amongst the different
architectures too.

Personally, I'd think being able to access, essentially,
<distribution>/<arch>/Packages, and <distribution>/source/Sources files,
for `distributions' of, eg, "stable", "unstable", "experimental" and
"Incoming", would be enough.

This assumes a perl library that can get the interesting information from
a Packages file in a usable form, of course. Such a library is pretty trivial
to write, though.

[0]

> through it, and
> letting the distribution choose which ones it wants.  There's some
> sort of dependency check routine that the distributions can use to
> make sure they remain consistent though that's a hard problem.  [Given
> a set of packages and relationships, choose a maximal set of them
> which is consistent.  A simple heuristic that might work is to ignore
> dependency information on a first pass and make variations, converging
> to a fixed point.]

This is probably a lot more complicated than what you really want. For
example, if you accidently break glibc2.1 (make it depend on gconv-modules
rather than just recommend it, and accidently not upload gconv-modules,
say), then your "maximum subset of packages" could instead be slink for
a few days, then become potato again once glibc2.1 is fixed.

At least, what I've been considering, is more "this is what you've
already got, which is more or less consistent. of these new packages,
choose as many as you can without breaking anything more".

This at least lets you work on subsets of size |new packages|, instead
of |stable + unstable + experimental + new packages|, and I *think*
that makes a greedy (find a solution that can't be easily improved,
rather than find the best possible solution) algorithm fairly reasonable.

An outline of the algorithm I'm currently thinking of is something like:
(being about ad vague as possible)

	TESTING = current contents of testing dist
	NEW = new packages from unstable that:
		(a) are installable in unstable
		(b) are uptodate for all arches
		(c) have no/fewer release critical bugs

	while |NEW| > 0; do
		take a package, p, from NEW

		try adding p to TESTING
		if p's not installable in TESTING,
			add dependencies of p until it is, or
			give up, and try a different p

		if p makes any other packages uninstallable,
			try upgrading each of them from NEW too, or
			give up and try a different p
		commit additions to TESTING
	done

	write TESTING to disk, update .debs, whatever

(for reference, I think I'm finally at about the point where I've got
the infrastructure to start trying to code that algorithm too... Inputs
and outputs are a bunch of Packages and Sources files)

At any rate, I agree with Jason --- version 1.0 doesn't even need to
consider any of that stuff to be a Really Cool Improvement.

> Distributions choose binaries, not sources, so then we run it through
> another routine which keeps all in-use source packages.

Having access to the sources would still be helpful though, I think.

> Finally any not in-use package sets are deleted.

All sounds very good, at any rate.

Cheers,
aj

[0] To ramble a bit. This might mean associating a script with each
    distribution, and having a dependency hierarchy like "stable can
    only be updated after proposed-updates is done", "unstable-nonmu"
    can only be updated after unstable has been", and so on.

    This can be reasonably arranged by using tsort(1) or something,
    of course.

-- 
Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/>
I don't speak for anyone save myself. PGP encrypted mail preferred.

 ``The thing is: trying to be too generic is EVIL. It's stupid, it 
        results in slower code, and it results in more bugs.''
                                        -- Linus Torvalds

Attachment: pgpUA4oCVhWSs.pgp
Description: PGP signature


Reply to: