[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Notes from the DebConf Source Format BoF

On Tue, Aug 10, 2010 at 08:27:24PM -0700, Russ Allbery wrote:

Since this is open for discussion, some comments about my take
on some of the discussed points:

> ftp-team is concerned about doing license checks across the entire git
> archive Colin points out that we're in the same situation with Alioth for
> redistributability.  However, it is easier to withdraw things from Alioth
> than from the archive.  And redistributability (the legal requirement) is
> a lot less of a bar than what we check for DFSG.

I can see that this could be a legitmate cause for concern, especially
since the history is essentially immutable and if "tainted" will
remain so unless it's deliberately excised and the history is altered.
However, is this a problem in reality, or just theoretical?  If it's
only a problem occasionally, then could this be better dealt with
(with the cooperation of upstream) on a case-by-case basis as and when
this becomes a real issue?

I'm not a fan of shallow clones due to the loss of history--we're
losing out on the main advantage to having a git repo at this point.
I'm not an expert WRT shallow clones, but can you get back to a full
clone given the packaged repo?  "get fetch --depth" appears to do this,
though I don't know how to specify "everything" if you don't know the
remote max depth.  So shallow clones seem fine as a source package
distribution format, but it would be nice to have debcheckout or
dpkg-source (or whatever tool is most appropriate) have the ability
to get back to the original state for actual work (and then strip
it back out again when repacking) without me needing to do the extra
work by hand.

> - Colin wonders why we don't have a central directory of all the source
>   package packaging repositories rather than putting it in package metadata.
>   * Even with that, if you look at stuff in stable, the chances are that a
>     lot of those repositories have gone away.

Having a central clone of the upstream repo, or at least the Debian
repo for each git-maintained package would go some way to alleviating
some of these concerns.  They could even automatically track upstream/
debian repos á la launchpad does for bzr.

> - debcheckout is only really useful if you're about to do development
>   * There's no uniform way to get a particular revision of the package.
>   * It may not be tagged, it may be on another branch, etc.

This is the key limitation of debcheckout as it stands.  However, while
I'm currently adding Vcs-Git* to debian/control by hand, could this
be taken directly from "git remote show" in the future to make it
reflect the exact origin?  (Would depend upon direct cloning of the
debian/upstream remote repo though.)  As mentioned, with the addition
of the --branch option to "git clone", it should be possible to specify
the exact branch.

> Joey would really rather upload his whole repository for things that he
> knows are clean, but that's a problem for ftp-master review, and you have
> to get into who you trust to make that determination.

If I tag a debian release in my repo and sign it with my Debian GPG key,
it should be possible to "upload" the new source package to Debian with
a "git push" (or upload a small .dsc and get the a central git repo to
do a pull from me).  It should all be properly verifiable from our GPG
web of trust.  Maybe best restricted to pulling from git.debian.org or
just pulling a single signed tag?

While this might seem somewhat more insecure compared with our
traditional tarball/diff upload mechanism, I don't really see that
myself.  If anything we gain rather more auditing of changes and
accountability if it's correctly signed, compared with a pile of
patches á la quilt/dpatch.  If anything bad does happen, we've got
the entire history right there with author and commiter of every

> You might be able to do a shallow clone of depth one and include every signed 
> tag that matches an entry in debian/changelog
> but it may be too bloaty.  That might ease the review.
>  - How would topic branches fit into this scheme?

If they are all merged into the debian release branch then they are
certainly not needed for building, but they would be required for
making changes and so should probably be included.  The dependent
branches should show up as parents to the commit, at least for the
last release, so should be possible to include.  Maybe including
commits all the way back to the previous Debian release would allow
for easier preservation of this detail?

> Best practices for Git repository layout?
> - git-buildpackage documentation is closest to that

I would have to disagree here, the git-buildpackage default layout is
far too "Debian-centric".  By naming the Debian and Upstream branches
"master" and "upstream" it's only really useful if you're importing
upstream release tarballs.  We should really be using a "debian" branch
for Debian-specific changes, and possibly even using multiple branches
for tracking oldstable/stable/unstable/experimental work.

If upstream is already using git, you might want to skip the tarball
step and use their git branches directly (and they might have their
own master branch).  Also potentially annoying for our downstreams as

> git push as an upload mechanism
> - Attractive because over time it builds a Git repository for the package
> - However, it assumes binaryless uploads, which we currently don't allow.

This is something to think about for the future though; dropping
binary uploads (by maintainers, not buildds) has been on the cards
for some time now hasn't it?  Is this still planned?

> If you're implementing 3.0 format, please don't hard-code the extensions that
> you "know" will be found in source packages, because as we add additional
> files listed in *.dsc, we may add other types of files.

We already found this out the hard way in sbuild; hopefully it's now
completely clean--we removed all assumptions about the expected

> What about repository size bloat if revision control history is included?

In practice, a shallow clone is typically only half the size of a
complete clone, so it's not going to eat too much extra archive space.
For schroot:

% du -sk schroot-shallow schroot schroot-full
4372	schroot-shallow
7556	schroot [cloned --depth 1 and then fetched all history]
6008	schroot-full
% du -sk schroot-shallow/.git schroot/.git schroot-full/.git
1720	schroot-shallow/.git
4904	schroot/.git [cloned --depth 1 and then fetched all history]
3356	schroot-full/.git
After repack and gc:
%  du -sk schroot-shallow/.git schroot/.git schroot-full/.git 
1520	schroot-shallow/.git
2920	schroot/.git
2916	schroot-full/.git
Packaged .git (after repack and gc):
 ls -l schroot*.bz2                             
-rw-r--r-- 1 rleigh rleigh 2765372 Aug 15 21:50 schroot-full.tar.bz2
-rw-r--r-- 1 rleigh rleigh 1403301 Aug 15 21:50 schroot-shallow.tar.bz2
-rw-r--r-- 1 rleigh rleigh 2764894 Aug 15 21:50 schroot.tar.bz2

So a five year history in this case is slightly less than double the
packed size--not a bad tradeoff for the entire project history (IMO).
Obviously for exceptional cases such as the Linux kernel this might
not be quite so optimal.  Not sure why there's a size difference if
you shallow clone then fetch all, rather than cloning the entire
thing--any history differences or just packed slightly differently?

> Currently in 3.0 (git), origin points to the bundle and doesn't embed the
> actual repository, but Joey is working on fixing that.  (Setting origin
> based on Vcs-Git.)

As I mentioned above, would it make sense to set Vcs-Git based on origin
on packing?  On unpack after debcheckout the opposite may be useful as
you say above.

> source.debian.org is working on importing source packages into a Git
> repository and storing the history as one revision per new source package
> upload.

While useful, don't we already have that if you're properly tagging
all Debian releases in your git repository already?  A central resource
would be useful in case the original repos go offline, but given the
space requirements, storing all the history should be possible, in which
case why not simply track the upstream(s)?

Lastly, one thing I'd like to push with git usage in Debian is
better integration with upstreams.  Rather than repeating it
all here, this is detailed in these mails:


Essentially, *everything* stays in git from upstream to distributed
releases to debian work and releases and also to downstreams.  There's
no import of release tarballs because they are in git too, and there's
no pristine tar because the GPG-signed tag of the distribution *is* the
release.  Currently what an upstream releases as the tarball might not
exactly match the release in the VCS (due to autotools bootstrap, other
generated files etc.) so here "make dist" actually makes a separate
"distribution" branch (as opposed to release) so you have a natural
set of branches:
  development → release → distribution → debian →→ downstream
and at each step you have GPG-signed tags giving you an auditable
chain of trust along the path.

This makes pulling changes from upstream and downstream trivial since
you have a contiguous history from end-to-end and the same applies to
pushing changes back e.g. with rebasing and format-patch.  This is
lost if importing tarballs.  Here, you can merge back a bugfix from
a downstream or debian straight back to the current upstream
development branch and submit it in a jiffy.  And upstreams/downstreams
can directly track what we're doing and cherry-pick changes at well.


  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment: signature.asc
Description: Digital signature

Reply to: