[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Potato now stable



On Sun, Aug 27, 2000 at 03:39:25PM +1000, Anthony Towns wrote:
> 
> That doesn't tell me what you mean by "archive". Do you consider contrib
> and non-free to be separate archives as well?
> 

   An archive is an entry in the archive table.  It can mean
main/contrib/non-free (if they have separate upload queues), or it could
mean US/non-US.  That decision is a policy decision that the packagepool
implementation doesn't make.  1 archive == 1 pinstall instance with related
file space.

> 
> Then two different files never actually have
> the same path. Either because they'd be the same file
> (debian/dists/woody/main/binary-i386/foo/bar_1.2-3_i386.deb), or because
> separate archives are called, eg, debian-non-us/dists or woody/non-US/main
> or similar.
> 

   In the even that two archives have two files of the exact same path &
filename, the database can ensure that they are in fact the same file (same
length and checksum), and can thus be freely overlapped.  For instance,
suppose helix had their archive connecting to the same database.  The
database would ensure that the Debian and Helix gtk packages of exactly the
same version and filename couldn't overlap without being identical.  This
would make a combined mirror possible just by copying, and it wouldn't
matter which archive was copied first (as far as .debs and sources was
concerned).  That's all the path integrity check does.

> > > > [...] pool/main and pool/non-free could both be in the same
> > > > directory though, they'd just need separate incoming directories to feed
> > > > them.
> >    We will likely always need master servers in different countries for
> > legal and bandwidth reasons, so I've designed the pool database to
> > record that information (the archive), and to prevent those archives from
> > overlapping files that are not identical.
> 
> But we *don't* need separate servers or separate incoming directories
> for main and non-free right now. Not legally and not technically.
>  

   No, but we need packagepols rather badly.  The Packages.gz files on
mirror.aarnet have been badly out of sync with the packages actually in
their archive every time I've tried to update for the last week.
The easiest way to handle contrib/non-free with the present implementation
is to make contrib/non-free into seperate archives (in the packagepool
sense).  They can overlap filestorage so that they continues to look the same
to mirrors and apt-get.  The alternatives are
   1) pinstall gets and trusts component from Section: from .changes
   2) packagepool has a manually maintained overrides-like table specifying
      the component

   Under the packagepool, each distribution generates exactly one
Packages.gz file, so non-free and contrib have to be separate distributions
anyway (woody-contrib for instance).  Distributions can be on separate
Archives.

> >    pinstall doesn't consult the .changes files for source files, just the
> > .dsc.
> 
> Then that's broken...

   It may change anyway.  Not all .dscs have a Source: stanza, which means
the sourcename has to be derived some other way (from the filename or the
.changes files).

> > The only additional interesting information in the .changes file
> > is the Section:   Getting at the .changes file before uploading the .dsc to
> > the pool is messy.
> 
> ...and something in there's poorly designed.

   It means parsing the same file (.changes) twice, once when the .dsc is
parsed, and then again later when the .changes is parsed to verify the
.debs.  The mess is in the overlap of functionality between .dscs and
.changes.  Both have PGP sigs on them (sometimes), both list source files
and checksums.


> 
> > > A license interpretation change means the package should *not*
> > > be in the component it was previously in, not that it should
> > > suddenly be in two components. Dual free/non-free sources aren't
> > > reasonable. Non-distributable things can't be packaged.
> 
> (xacc and xacc-smotif is an existing example of a free source producing
> a non-free binary)

   Thanks, I thought I'd seen some.

> 
> Unfortunately it doesn't tell me what a "distribution" is (is it "woody"?
> is it "woody/non-free"? is it "woody/non-US/non-free"?), it doesn't tell
> me what a "deb" is, it doesn't tell me how "deb" and "arch" interact,
> or what happens for binary-only-recompiles or arch: all packages. It
> doesn't give me any context for anything.

   A distribution is the label associated with a set of packages that go
into one Packages.gz per architecture.  A release, (say woody) can get
different architectures out of sync, so it's necessary to allow multiple
versions of binary-all packages.  glibc-data 2.1.13-9_all for ARM and
glibc-data 2.1.13-11_all for i386 for instance while the autobuilders catch
up.  That's the purpose of the arch field in the pool, to allow multiple
'all' packages into different Packages.gz files for the same release but
different architectures.   deb is a reference to the table containing the
information about particular .debs.  In particular it also has an arch.
Unless the deb.arch is 'all' this field and pool.arch should match.
If deb.arch is 'all' then pool.arch is one of the real architectures.
Does this explain adequately what the field is for?


> 
> >    Maintaining the signature is required in order to recreate the database
> > in the even of a flush, and it's a good idea anyway to keep mirrors honest
> > and security-paranoid happy.
> 
> ObVious: so don't flush the database.

   Why do you think not being able to ever flush the database is a feature,
let alone an essential feature?  Allowing flushes was one of my design goals
- I think it's a feature worth having.  Mirrors that don't want to carry the
signed .changes files can choose not to have them.  A mirror could even
un-pgp all of them and distribute those images.  Stripping the signatures at
the master archive though makes them unavailable to any other source, and
also makes flushing and reloading the database impossible (unless the
signatures are stored somewhere, in which case why not store them where they
came from in the first place).

> 
> Alternately, structure the pool layout so you can still work out the
> component based on the filename.
> 
> Adding the Section to a .dsc or ignoring components altogether doesn't
> increase security in any measurable way, either.

   Adding Section to .dsc has nothing to do with security.  Where did you get
that from?

> 
> >    There was more in this than you read.  The packagepool doesn't use any
> > symlinks.
> 
> The only reason symlinks are particularly useful are either to support
> old programs and partial mirrors, or to make it easy so people browsing
> the ftp site don't have to look in a myriad of places for things.



> 
> > After it takes over generation of the Packages files they could
> > all be removed.  Eventually was also long term, until stable distributions
> > are deleted entirely from the master ftp server and moved to a different
> > location.  flag day didn't refer to a day when everything changed, it
> > refered to the day they finish changing.  The other proposals I've seen for
> > package pools require moving all .debs into the pool/ directory and putting
> > them in hashes, while keeping everything running using symlinks.  Under this
> > packagepool they never have to move at all.  They can be moved, but they can
> > also just sit where they are forever.  Moving them costs mirrors whether it
> > happens in one day or 300.
> 
> Moving them happens anyway, when, eg, slink gets moved from ftp-master
> to archive.debian.org (or potato, or woody, or whatever).

   Under the packagepool they don't have to be deleted at all, they really
don't have to move until the active (in the pool) distributions all mve to
newer .debs located under pool/
   To delete slink the archive maintainer would
DELETE FROM pool WHERE pool.distribution=
 (SELECT id FROM distribution WHERE name='slink');

   A few days later pclean would delete all of the slink .debs that were no
longer needed by potato or woody.  The rest would stay.  A few days after
that the source files for the deleted .debs would also be cleaned, except
those that were needed by other versions (say pristine source, different
debian versions).

> 
> In the long term, moving .debs into a separate directory from the pool
> is useful simply because it means we *don't* have to keep moving them
> around every release.
> 

   One of the essential goals of packagepools are that files stay put. You
don't have to move old files into new locations to meet this goal, doing so
wastes bandwidth, but makes the directory structure a little neater. Whether
that extra neatness is worth the price I don't know - most of them will be
moved due to new uploads and attrition of old distributions - some day it
may be worth doing a dozen NMU's to clean out the .debs in dists/slink.



Reply to: