[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Potato now stable



On Wed, Aug 23, 2000 at 04:44:15PM +1000, Anthony Towns wrote:
> (This may be better on the debian-pool list, I guess)
> 
> On Wed, Aug 23, 2000 at 03:26:51PM +1000, Drake Diedrich wrote:
> >    pinstall (part of the packagepool at http://master.debian.org/~dld/pool/
> > ) does the signature check and initial install into the pool database.  The
> > location within the pool archive is currently simply pool/sourcename .  The
> > pool itself can handle any hash though, 
> 
> The structure Jason and I have been discussing is slightly different,
> and has some really nice properties. The archive layout would be
> something like:
> 
> 	pool/main/libf/libfoo/libfoo-dev_1.2-1_i386.deb
> 
> That is:
> 
> 	* separate pools for each component (main, contrib, and non-free,
> 	  and presumably non-US/main, non-US/contrib, and non-US/non-free
> 	  also)

   The design I was working on would have these all as separate archives
consulting a common database.  Constraints can be (and currently are)
applied to ensure that there are no conflicting overlaps between archives
(same path, different checksum).  It's not much more complexity in pinstall
(a few lines in the hash function) to add greater detail, but with
multiple-archive capabilities the need is less than in the dinstall case.
There's also no reason two archives can't be on the same machine.  Sharing
the pool though would be a bit more complex and defeat the point of
separating them.  pool/main and pool/non-free could both be in the same
directory though, they'd just need separate incoming directories to feed
them.

> 	* hashing based on either the first character of the source
> 	  package, or the first four characters, if it's a library source
> 	  (to keep it fairly evenly distributed)

   It's certainly possible to do this (just coded, untested), but I question
whether it's actually necessary.  A few thousand source directories right in
pool isn't all that difficult to handle by the filesystem code, especially
if it's relatively static and cached.  A single level of indirection and a
couple thousand inodes at each level gives us several million files already,
and filesystem code and hardware are both improving quickly.

   Separating main/contrib/non-free is problematic still though, as the .dsc
files do not list a section, and source files are installed in the first
pass before the .debs are (completely avoids the case where .debs have no
source).  Source packages that want to generate both free and non-free
packages are also problems, but if they're written so that their behavior
can be controlled at build time to create one or the other then the
identicle source could be uploaded to both archives under the same hash. The
only easy alternative to this I can think of is banning free/non-free source
packages (insisting that they be split or duplicated).  Looking in the .debs
ahead-of-time to decide where the source goes would be difficult.


> 
> This means to find a package you need to know five things:
> 
> 	its name
> 	the version you want
> 	the architecture you want it for
> 	the name of its source package
> 	the component its in

   For filenames I'm keeping the name that was uploaded, and if it exists
already then that binary isn't installed (fails only if the checksum/length
differ).  So all of this is met by the current pinstall.
   I haven't dealt with a related issue: epochs.  At the moment pinstall
would refuse to install a package with a higher epoch version that otherwise
had the same filename as a package already in the pool.  I can't think of a
good reason we'd want this capability, and currently don't have it anyway as
dinstall allows only one version in unstable anyway.  

> 
> It also means it's trivial to not mirror non-free if you want.

   Separate archives would also be trivial.  There are a few cases where it
would be advantageous to mirrors to allow overlaps, such as license
interpretation changes, dual free/non-free source packages, personal
non-distributable CDs, ...  2nd Law: it's easier to mix separate things than
to unmix combined things.

> Now, while the actual pool layout separates components at the top level,
> the database *doesn't* do this, at least the way we've been talking about
> on IRC. Instead, we have (in relational speak) a single huge table that
> basically says:
> 
>  [ binary_name, binary_version, architecture, source, component, ... ]
>    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> (translating, that is that given a binary package, its version and its
> architecture (ie, the filename libfoo-dev_1.2-3_i386.deb), we can uniquely
> determine the source, component, and any other applicable details)
> 
> This means we *won't* permit more than one of, eg:
> 
> 	pool/main/libf/libfoo/libfoo-doc_1.2-3_all.deb
> 	pool/main/libf/libfoo-hurd/libfoo-doc_1.2-3_all.deb
> 	pool/non-free/libf/libfoo/libfoo-doc_1.2-3_all.deb

  The pool I've implemented wouldn't allow this, but for different reasons. 
There is no deep hierarchy.  In addition I've experimented with a constraint
to refuse different files having the same checksum/length, which would
avoid the possiblity of duplication within the archives.  I've removed it
for now as I need to come up with a way to allow duplicates on different
archives if they have the same full path.

> It also means that when we decide which packages are in which
> distribution, we don't need to separate by component. So to specify
> packages in potato, say, we can simply have a table:
> 
> 	potato, fileutils, 4.0l-8, alpha
> 	potato, fileutils, 4.0l-8, arm
> 	potato, fileutils, 4.0l-8, i386
> 	potato, xv,        3.10a-25, alpha
> 	potato, xv,        3.10a-24, arm
> 	potato, xv,        3.10a-35, i386
> 
> (this would be done differently in LDAP. The relational way seems clearer
> for this purpose though, IMO)

CREATE TABLE pool (
        distribution INT4 NOT NULL REFERENCES distribution,
        deb     INT4 NOT NULL REFERENCES deb,
        arch    INT4 NOT NULL REFERENCES arch,
        section TEXT,
        install TIMESTAMP
);


> 
> It's probably suboptimal to have to have separate incoming queues.
> 
> The above layout basically just means you have to construct a new
> "Component:" field for the all-packages-in-the-pool table.
> 

    Which means new uploads of every package?  This wouldn't be required
with separate upload queues and separate archives - just don't process the
wrong set of packages when preloading each archive pool with the old
archives.

> 
> In any event, it's not *making* policy decisions, it's simply enforcing
> them. I'm inclined to think having a separate "Component:" field in the
> source control stanza would be a better way of expressing it.
> 

   It would be nice if it existed, but we have to deal with the packages as
they are now, and once we've done that there'll be no need for a Component
field if we have separate upload queues after the changeover.

> If experimental .debs are to be included in the pool, the above would
> probably imply we'd end up with:
> 	dists/experimental/{main,contrib,non-free}
> which probably isn't a bad thing.

   I'm currently including them in the main pool, they just only get listed
in the dists/experimental/*/Packages.gz files and nowhere else.  It probably
won't happen often, but experimental packages might occaisionally end up
being good enough for the stable track.
   Under the current implementation dists/ still holds all of the old
dinstalled .debs and sources.  New uploads (and all of the .changes files)
go to pool/.  New Packages/Sources/Contents files are generated and placed
in dists/ as well.  Eventually dists/ will be emptied of .debs and sources
with no flag day.

-Drake



Reply to: