Re: Potato now stable

To: debian-pool@lists.debian.org
Subject: Re: Potato now stable
From: Drake Diedrich <Drake.Diedrich@anu.edu.au>
Date: Sat, 26 Aug 2000 01:23:19 +1000
Message-id: <[🔎] 20000826012319.A11995@empire.anu.edu.au>
Mail-followup-to: Drake Diedrich <Drake.Diedrich@anu.edu.au>, debian-pool@lists.debian.org
In-reply-to: <[🔎] 20000825210944.A31804@azure.humbug.org.au>; from aj@azure.humbug.org.au on Fri, Aug 25, 2000 at 09:09:44PM +1000
References: <20000818163019.A12102@azure.humbug.org.au> <Pine.LNX.3.96.1000819185854.15115H-100000@wakko.deltatee.com> <20000823152651.B3917@empire.anu.edu.au> <[🔎] 20000823164415.C20232@azure.humbug.org.au> <[🔎] 20000823184238.A3851@empire.anu.edu.au> <[🔎] 20000825210944.A31804@azure.humbug.org.au>

On Fri, Aug 25, 2000 at 09:09:44PM +1000, Anthony Towns wrote:
> On Wed, Aug 23, 2000 at 06:42:39PM +1000, Drake Diedrich wrote:
> > > The structure Jason and I have been discussing is slightly different,
> > > and has some really nice properties. The archive layout would be
> > > something like:
> > > 	pool/main/libf/libfoo/libfoo-dev_1.2-1_i386.deb
> > > That is:
> > > 	* separate pools for each component (main, contrib, and non-free,
> > > 	  and presumably non-US/main, non-US/contrib, and non-US/non-free
> > > 	  also)
> >    The design I was working on would have these all as separate archives
> > consulting a common database.
> 
> I don't really understand what you mean by this. Can you give a sample
> layout? What do you mean by "archive"?
> 
> (Here's a glossary as I understand it:
> 	Component : main, contrib, non-free, etc
> 	Distribution : stable, woody, potato, etc
> 	Architecture : binary-i386

   Most archives are in different nations: Debian-US, Debian-non-US.
Debian-non-US though has dependencies on Debian-US, so Debian-US either has
to be hosted on non-US completely, or there has to be a database shared
between them.  These are what the separate archives are intended for.
  Within each archive are a number of distributions (stable/woody/...).
.debs and sources are stored in any directory within the archive (the path
to the file), and this location is one of the fields in the file table.

> > Constraints can be (and currently are)
> > applied to ensure that there are no conflicting overlaps between archives
> > (same path, different checksum). It's not much more complexity in pinstall
> > (a few lines in the hash function) to add greater detail, but with
> > multiple-archive capabilities the need is less than in the dinstall case.
> > There's also no reason two archives can't be on the same machine.  Sharing
> > the pool though would be a bit more complex and defeat the point of
> > separating them.  pool/main and pool/non-free could both be in the same
> > directory though, they'd just need separate incoming directories to feed
> > them.
> 
> Similarly, I don't really see what you mean by "different paths", or
> why different archives might need to be on different machines, or what
> you mean by sharing the pool, or why any of this requires different
> incoming directories...

   paths refered to the full path within the filesystem of each archive.

   We will likely always need master servers in different countries for
legal and bandwidth reasons, so I've designed the pool database to
record that information (the archive), and to prevent those archives from
overlapping files that are not identical.

   Sharing the pool would be running two archives on the same server and
overlapping the top level directory so that a single sources.list could
reference both.

> 
> > > 	* hashing based on either the first character of the source
> > > 	  package, or the first four characters, if it's a library source
> > > 	  (to keep it fairly evenly distributed)
> >    It's certainly possible to do this (just coded, untested), but I question
> > whether it's actually necessary.  A few thousand source directories right in
> > pool isn't all that difficult to handle by the filesystem code, especially
> > if it's relatively static and cached.
> 
> Well, it's mainly so people with ftp clients have *some* chance of
> navigating through it all. If we wanted, we could just dump every .deb
> ever straight in pub/debian/pool and expect the filesystem to cope with
> that too.

  cd x ; ls
  ls -d x*

  Dumping every .deb into pool/ would cause the server to lockup badly
(minutes) every time someone did an ls.  Even Resierfs would likely suffer
under that scenario.

   As posted by someone else, the current hash that you and Jason came up
with is still very suboptimal, a third of the packages all end up in pool/x/
anyway.  A factor of 3 isn't much for an entire hierarchy level.  As I said,
I just don't see the need, but it's coded and works.  Changing the hash
every few months will spray new packages all over the database, so if we
can't come up with a better hash (intuitive and well distributed) I think
we're better off not choosing one. Especially since we don't need one now
and are less likely to need one in the future.

> 
> >    Separating main/contrib/non-free is problematic still though, as the .dsc
> > files do not list a section, and source files are installed in the first
> > pass before the .debs are (completely avoids the case where .debs have no
> > source).
> 
> It's the .changes file dinstall looks at to decide where things go,
> though.  So you either get an upload to a "Distribution: non-free",
> or the files get marked to go into "non-free/blah".
> 

   pinstall doesn't consult the .changes files for source files, just the
.dsc.  The only additional interesting information in the .changes file
is the Section:   Getting at the .changes file before uploading the .dsc to
the pool is messy.  There's also the likiehood that we'll have pure source
packages.  Signed .changes and signed .dsc's are redundant - I initially
verified the sigs twice and tied .dscs to their .changes files.  The
additional complexity was causing problems, so I chose to make .dsc files
authoritative and just ignore the source-file entries in .changes files.
Perhaps this is something that needs to be worked on, but I'd rather
concentrate on the automatic promotion scripts.

> As far as existing .debs go, all you need to do is look at what directory
> they're located in right now.

   Yep.

> 
> Having sources that build non-free and main packages isn't possible: if
> the license for the source is DFSG-free, you can build free binaries. If
> it's not the source shouldn't go in main in the first place. DFSG-free
> sources that build DFSG-free packages, only some of which depend on
> non-free software is more plausible, but we're already tending to demand
> they be split anyway, AIUI.
> 

   App licensed BSD.  Link against motif and you're non-free. Link against
lesstif and go into main.  Debian may not have any source packages that do
this at present, but they're entirely possible so I mentioned that they
would be a problem.  The easiest solution is to not support them.

> 
> A license interpretation change means the package should *not*
> be in the component it was previously in, not that it should
> suddenly be in two components. Dual free/non-free sources aren't
> reasonable. Non-distributable things can't be packaged.

   Allowing overlaps makes the transition much smoother when something
moves.  It's also useful if you have a derivative archive and want to 
keep your archive in sync with Debian/main, while making sure things you
depend on aren't deleted by Debian.  Non-US, helix, KDE, stormix, or Corel
could all use this feature.  It's a feature.

>  
> > CREATE TABLE pool (
> >         distribution INT4 NOT NULL REFERENCES distribution,
> >         deb     INT4 NOT NULL REFERENCES deb,
> >         arch    INT4 NOT NULL REFERENCES arch,
> >         section TEXT,
> >         install TIMESTAMP
> > );
> 
> This really isn't helpful at all. All the integers don't interest me
> at all. What exactly is all this stuff, what are the tables that are
> apparently referenced, what are the primary and secondary keys, and why
> did you split it like you have? These are the questions that you need
> to answer if you want the SQL stuff you've written to be taken seriously.

   This was in response to a requirement that there be a table listing these
things.  I quoted the table.  The NOT NULL REFERENCES implies that each of
the integers is the primary key of another table, and insists that that
entry in the second table cannot be deleted unless the entry in the pool
table is deleted first.  It also demonstrated that the section field
can be stored in the pool table, and therfore overridden differently in each
distribution.  section is up for grabs by the release managers, the pool
doesn't use it for anything at present.

> Having code's all very well, but if all you're going to say beyond that is
> "take it or leave it", it's probably just going to be left.
> 
> > > It's probably suboptimal to have to have separate incoming queues.
> > > The above layout basically just means you have to construct a new
> > > "Component:" field for the all-packages-in-the-pool table.
> >     Which means new uploads of every package?  This wouldn't be required
> > with separate upload queues and separate archives - just don't process the
> > wrong set of packages when preloading each archive pool with the old
> > archives.
> 
> Huh? It's trivial to work out which component each .deb is in right
> now: you just look at its path, or the path of the Packages files that
> reference it.

   The problem isn't old .debs, it's new .dscs.  They're loaded in first.
There may not be any alternative to parsing both .changes and .dsc for
source uploads, but I'm trying to avoid it.  Adding a Component: field to
the .debs or .dscs means changing them, which breaks the signature and would
require a new upload to maintain the signed chain back to the maintainer -
short of a whole new signed .deb format, which would also require a new
upload.  We could just change them and sign them like an autobuilder does,
but that still changes them and triggers a huge mirror hit.
   Maintaining the signature is required in order to recreate the database
in the even of a flush, and it's a good idea anyway to keep mirrors honest
and security-paranoid happy.

> > > If experimental .debs are to be included in the pool, the above would
> > > probably imply we'd end up with:
> > > 	dists/experimental/{main,contrib,non-free}
> > > which probably isn't a bad thing.
> >    I'm currently including them in the main pool, they just only get listed
> > in the dists/experimental/*/Packages.gz files and nowhere else.  It probably
> > won't happen often, but experimental packages might occaisionally end up
> > being good enough for the stable track.
> 
> Experimental packages supposedly operate under the constraint that the
> version of the package in unstable is strictly greater than the corresponding
> version in unstable, when that's violated, the package should disappear from
> experimental.

   I think one of the unstable's above is experimental, and yes this is the
way it works right now, a new upload to unstable (anything not experimental)
will cause the experimental package to drop out of the 'experimental'
distribution in the pool.  If no release manager has manually added that
package to another release the .deb and source will eventually be deleted by
pclean, but not instantly to avoid invalidating a Packages.gz file on a
distant mirror that gets caught in the middle of an archive update.

> 
> >    Under the current implementation dists/ still holds all of the old
> > dinstalled .debs and sources.  New uploads (and all of the .changes files)
> > go to pool/.  New Packages/Sources/Contents files are generated and placed
> > in dists/ as well.  Eventually dists/ will be emptied of .debs and sources
> > with no flag day.
> 
> Yeah, well, that goes without saying. Remirroring the whole archive in
> a day isn't reasonable.
> 

   There was more in this than you read.  The packagepool doesn't use any
symlinks.  After it takes over generation of the Packages files they could
all be removed.  Eventually was also long term, until stable distributions
are deleted entirely from the master ftp server and moved to a different
location.  flag day didn't refer to a day when everything changed, it
refered to the day they finish changing.  The other proposals I've seen for
package pools require moving all .debs into the pool/ directory and putting
them in hashes, while keeping everything running using symlinks.  Under this
packagepool they never have to move at all.  They can be moved, but they can
also just sit where they are forever.  Moving them costs mirrors whether it
happens in one day or 300.

Reply to:

Follow-Ups:
- Re: Potato now stable
  - From: Drake Diedrich <Drake.Diedrich@anu.edu.au>
- Re: Potato now stable
  - From: Anthony Towns <aj@azure.humbug.org.au>

References:
- Re: Potato now stable
  - From: Anthony Towns <aj@azure.humbug.org.au>
- Re: Potato now stable
  - From: Drake Diedrich <Drake.Diedrich@anu.edu.au>
- Re: Potato now stable
  - From: Anthony Towns <aj@azure.humbug.org.au>

Prev by Date: Re: Potato now stable
Next by Date: Re: Potato now stable
Previous by thread: Re: Potato now stable
Next by thread: Re: Potato now stable
Index(es):
- Date
- Thread