[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

multi-archive support in dak: first report


Here comes my first report on my Google Summer of Code project to
implement multi-archive support in dak:

Getting started

To get started I installed dak locally on my machine.  As dak usually
runs only on stable I had to patch a few things to get it running on
wheezy (which were already merged).

I also found some dak commands I did not know about yet.  It turned
out they are no longer of any use and are now pending removal.

First steps

While there was an archive table in the database, it was not really
useful to have multiple entires there as dak did not record which
suite belonged to which archive.  So as a first step I did add a
column relating suites to archives.

This allows tools to create files relative to the archive root for the
given suite instead of under a fixed path for all suites (Dir::Root).
I did patch most tools to do so.  Still missing are check-archive,
control-suite, copy-installer, generate-index-diffs and the
daklib/queue.py module.

The next, larger step was to allow the same file to exist in multiple
archive at the same time.  This wasn't possible as the files table
links to a single location, so it had to be replaced by a N:M relation
between files and archives (files_archive_map).  I decided to also
allow files to exist in multiple components (main/contrib/non-free) in
the same archive.  This is not a large change, but helps with moving a
package between components while keeping the same .orig.tar.gz.

New problems

While this sounds quite simple, this change has many consequences:

 - An upload may now reference files that are already known to dak, but
   not in the right archive and need to be copied over.

 - Files might be removed from single archives, but we need to make
   sure not to remove sources for binaries in an archive (ie. this is
   now a per-archive constraint instead of a global constraint).

 - When moving binaries between archives, we have to make sure the
   source is also available in the target archive.

I started to work on teaching the package installation logic to add
the needed entries in files_archive_map (not so hard) and planned to
later drop the files.location relation, however I came to the
conclusion that another approach might be better.

Next steps

Even if the neccesary changes were implemented in process-upload, I
would still need to re-implement parts of it to allow moving packages
between suites/archives from other places or installing packages into
multiple suites as I need to in order to replace build queues with
regular suites (process-upload and the modules it uses is not really
usable from a different context).  Also the tendency of process-upload
to leave the archive in a inconsistent state in case of bugs and its
"dict-oriented" programming[1] turned out to be quite annoying.

  [1] Large parts use dicts as data structures where it is not clear
      where and when values are set and what they mean.

So I started to work on a module that allows to manipulate the archive
in a safe way[2] and is usable as a library.  So far I am progressing
quite well: I can already install and copy packages in normal cases;
there may still be problems with NEW packages and byhand is not yet
implemented.  Some code for removing packages is also written, but not
yet tested.

  [2] That it is to always keep the archive in a consistent state.

There is also one additional change: as one of my goals of the GSoC
project is to remove code duplication, I now also plan to convert
policy queues to regular suites as well (instead of only converting
build queues).  This means less special cases to implement in the new
code (as files can *only* be in archives and nowhere else).  On the
downside it means process-policy and queue-report will also need a few

My work can be found in my Git repository at [3].  Most of the work
happens in the pu/multiarchive-{1,2} branches.

  [3] https://ftp-master.debian.org/users/ansgar/dak.git


Reply to: