Re: Package Pool Proposal

To: Guy Maor <maor@debian.org>
Cc: debian-devel@lists.debian.org
Subject: Re: Package Pool Proposal
From: Jason Gunthorpe <jgg@ualberta.ca>
Date: Tue, 23 Nov 1999 00:55:11 -0700 (MST)
Message-id: <Pine.LNX.3.96.991123001231.14774c-100000@wakko.deltatee.com>
Reply-to: Jason Gunthorpe <jgg@ualberta.ca>
In-reply-to: <87ln7pelxu.fsf@femto.dyn.ml.org>

On 22 Nov 1999, Guy Maor wrote:

> Otherwise downloading a single file becomes a bit more difficult.
> Perhaps this isn't that important, and users would have to query to
> database to find the real path?

I liked the result where we special case common prefixes, ie lib* uses the
3rd char, and so on. It gave a good distribution and wasn't too hard to
guess.

> The database.  I haven't really picked an implementation technology
> and welcome advice.  I think whichever I pick, it should be accessible
> via LDAP and web.  Maybe the entire database is downloadable from the

I would be tempted to say that straight LDAP [db2 backend] would be a wise
choice. It is nice and flexable and the security system is exactly what we
want. There are some problems we might have in getting a schema, I know
Ben Collins has thought about this problems [Ben?]

The big drawback is that it doesn't have any sort of server-side query
langauge, particularly joins have to be done as descrete queries - though
the only join I can see using is between the source list and the binary
lists..

In any event OpenLDAP has this nice backend feature, for instance Ben
wrote a backend that can access the bugs db directly. 

> package is available for each architecture will be there.  Build
> daemons could use it to find out what to build, place reservations

Moving all the buildd stuff into a central DB is also a good gain.

> includes the ones from binary-<arch> to binary-all.  Marcus, you're
> right that the pool symlink should be one level higher.  A possible

IMHO - ditch the pool symlink. Our archive structure already has full
paths starting the root directory, there is no reason the Package file
can't have pool/main/binary-i386/t/true_foo_i386.deb directly.

> No section hierarchy.  Categorizing our enormous package set with one
> section and priority axis is ridiculous.  I regard both these fields

Indeed, but untill someone really makes a nice GUI they are required by
dselect.

> gtk clock programs for example.  Priorities might not be needed any
> more.

Priorities play a role in how APT makes some of its automatic choices,
it is good to be able to tell at least the core packages from the rest.

> Dependency information.  Yes, dependency information is clearly there,
> and is used as a rule on whether to delete something from the archive

> Information in the database.  Everything should be in the database.
> Copyrights, upstream URLs, changelogs, content listings, dependencies,
> etc.  The package web pages should be generated dynamically from the
> database.

I would like to also see the raw control file accessible outside the DB,
with just a quick cross-reference to it. IMHO start with an empty DB and
add things to it as you find uses for them.

How APT works is that it stores the information critical to its function
in its binary DB and then cross references the real package file for the
supplementary info. This has proved to be quite usefull and efficient.

All the hard problems relating to dependencies are really going to need
the whole dependency tree loaded in memory, and pulling it out of a DB is
not really any better than pulling it out of flat files - you still
have to parse and cross-reference it.

Content listings, raw copyright text, and other big, not very usefull,
items like that are probably more efficiently stored in a seperate
(offline) db that is used exclusively by the special tools that need them,
mkcontents for instance.

> before they make it into the archive.  Since we now have two new
> people processing incoming, and many more waiting once those two get

Perhaps processing incoming can be made much simpler with this new system,
particularly less human intensive.

> make sure they remain consistent though that's a hard problem.  [Given
> a set of packages and relationships, choose a maximal set of them
> which is consistent.  A simple heuristic that might work is to ignore
> dependency information on a first pass and make variations, converging
> to a fixed point.]

Very hard actually - if I were you, I'd ignore it for the first attempt,
just re-create exactly what we have now. APT uses a routine like you
describe, but it does not act across multiple versions and it is not able
to converge in some strange degenerate cases (like perl :<)  AJ has given
some thought to this issue as well.

Jason

Reply to:

Follow-Ups:
- Re: Package Pool Proposal
  - From: Ben Collins <bcollins@debian.org>

References:
- Re: Package Pool Proposal
  - From: Guy Maor <maor@debian.org>

Prev by Date: Re: Package Pool Proposal
Next by Date: Re: Jigsaw and libwww
Previous by thread: Re: Package Pool Proposal
Next by thread: Re: Package Pool Proposal
Index(es):
- Date
- Thread