[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Package Pool Proposal



Lots of good comments!

The hashing.  First letter of package is clearly not a good choice.  Of
course I can use a real hash function, but I was hoping that I could
come up with something simple enough to calculate in one's head.
Otherwise downloading a single file becomes a bit more difficult.
Perhaps this isn't that important, and users would have to query to
database to find the real path?

The database.  I haven't really picked an implementation technology
and welcome advice.  I think whichever I pick, it should be accessible
via LDAP and web.  Maybe the entire database is downloadable from the
ftp site.  All archive information, including which version of which
package is available for each architecture will be there.  Build
daemons could use it to find out what to build, place reservations
that they are building so another build daemon doesn't grab it, etc.
Authentication on database modifications will be tied to the current
maintainer LDAP server.

Lack of symlinks.  Yes, I want to get rid of them completely, and that
includes the ones from binary-<arch> to binary-all.  Marcus, you're
right that the pool symlink should be one level higher.  A possible
future general solution for hurd would add binary-hurd-all.  I suppose
I should use binary-linux-all and reserve binary-all for
cross-architecture, cross-OS packages, but actually how it's stored
in the pool doesn't really matter as long as I can avoid collisions.
I could always invent a new architecture field, store them in the pool
with the full dpkg-name name, and write distribution selection code to
choose the right one.  Dividing the pool into architectures is still
important because a lot of people want to easily mirror an
architecture subset.

No section hierarchy.  Categorizing our enormous package set with one
section and priority axis is ridiculous.  I regard both these fields
as practically useless.  We need to allow for a package to exist in
multiple sections.  A package-selection program could then show all
gtk clock programs for example.  Priorities might not be needed any
more.

Dependency information.  Yes, dependency information is clearly there,
and is used as a rule on whether to delete something from the archive
(no more binaries without source, etc.) and also on whether to include
something in the distribution (no more missing dependencies).
Possibly maintainers could be sent alerts when this happens.  A lot of
apt code gets reused here.

Information in the database.  Everything should be in the database.
Copyrights, upstream URLs, changelogs, content listings, dependencies,
etc.  The package web pages should be generated dynamically from the
database.

Installing new packages without authorization.  Maybe this isn't such
a good idea.  We really ought to have some human checking things
before they make it into the archive.  Since we now have two new
people processing incoming, and many more waiting once those two get
sick of it (truly it's boring work), I am leaning toward the current
system.

I'll start thinking about how the database needs to be structured to
make the types of queries we need to do be cheap to do.  There needs
to be some concept of package-sets: all the source and debs for a
package.

The daily run code becomes pretty easy:

dinstall runs, does some sanity checks on the packages, and installs
it into the pool and records it in the database.  It probably would
still require some human to ok a new package.  It doesn't delete
anything out of the archive.

We iterate through the distributions, pushing every package (or
package-set?  probably mixing package-sets is bad) through it, and
letting the distribution choose which ones it wants.  There's some
sort of dependency check routine that the distributions can use to
make sure they remain consistent though that's a hard problem.  [Given
a set of packages and relationships, choose a maximal set of them
which is consistent.  A simple heuristic that might work is to ignore
dependency information on a first pass and make variations, converging
to a fixed point.]

Distributions choose binaries, not sources, so then we run it through
another routine which keeps all in-use source packages.

Finally any not in-use package sets are deleted.


Guy


Reply to: