On Tue, Aug 10, 2010 at 08:27:24PM -0700, Russ Allbery wrote: Since this is open for discussion, some comments about my take on some of the discussed points: > ftp-team is concerned about doing license checks across the entire git > archive Colin points out that we're in the same situation with Alioth for > redistributability. However, it is easier to withdraw things from Alioth > than from the archive. And redistributability (the legal requirement) is > a lot less of a bar than what we check for DFSG. I can see that this could be a legitmate cause for concern, especially since the history is essentially immutable and if "tainted" will remain so unless it's deliberately excised and the history is altered. However, is this a problem in reality, or just theoretical? If it's only a problem occasionally, then could this be better dealt with (with the cooperation of upstream) on a case-by-case basis as and when this becomes a real issue? I'm not a fan of shallow clones due to the loss of history--we're losing out on the main advantage to having a git repo at this point. I'm not an expert WRT shallow clones, but can you get back to a full clone given the packaged repo? "get fetch --depth" appears to do this, though I don't know how to specify "everything" if you don't know the remote max depth. So shallow clones seem fine as a source package distribution format, but it would be nice to have debcheckout or dpkg-source (or whatever tool is most appropriate) have the ability to get back to the original state for actual work (and then strip it back out again when repacking) without me needing to do the extra work by hand. > - Colin wonders why we don't have a central directory of all the source > package packaging repositories rather than putting it in package metadata. > * Even with that, if you look at stuff in stable, the chances are that a > lot of those repositories have gone away. Having a central clone of the upstream repo, or at least the Debian repo for each git-maintained package would go some way to alleviating some of these concerns. They could even automatically track upstream/ debian repos á la launchpad does for bzr. > - debcheckout is only really useful if you're about to do development > * There's no uniform way to get a particular revision of the package. > * It may not be tagged, it may be on another branch, etc. This is the key limitation of debcheckout as it stands. However, while I'm currently adding Vcs-Git* to debian/control by hand, could this be taken directly from "git remote show" in the future to make it reflect the exact origin? (Would depend upon direct cloning of the debian/upstream remote repo though.) As mentioned, with the addition of the --branch option to "git clone", it should be possible to specify the exact branch. > Joey would really rather upload his whole repository for things that he > knows are clean, but that's a problem for ftp-master review, and you have > to get into who you trust to make that determination. If I tag a debian release in my repo and sign it with my Debian GPG key, it should be possible to "upload" the new source package to Debian with a "git push" (or upload a small .dsc and get the a central git repo to do a pull from me). It should all be properly verifiable from our GPG web of trust. Maybe best restricted to pulling from git.debian.org or just pulling a single signed tag? While this might seem somewhat more insecure compared with our traditional tarball/diff upload mechanism, I don't really see that myself. If anything we gain rather more auditing of changes and accountability if it's correctly signed, compared with a pile of patches á la quilt/dpatch. If anything bad does happen, we've got the entire history right there with author and commiter of every change. > You might be able to do a shallow clone of depth one and include every signed > tag that matches an entry in debian/changelog > but it may be too bloaty. That might ease the review. > - How would topic branches fit into this scheme? If they are all merged into the debian release branch then they are certainly not needed for building, but they would be required for making changes and so should probably be included. The dependent branches should show up as parents to the commit, at least for the last release, so should be possible to include. Maybe including commits all the way back to the previous Debian release would allow for easier preservation of this detail? > Best practices for Git repository layout? > - git-buildpackage documentation is closest to that I would have to disagree here, the git-buildpackage default layout is far too "Debian-centric". By naming the Debian and Upstream branches "master" and "upstream" it's only really useful if you're importing upstream release tarballs. We should really be using a "debian" branch for Debian-specific changes, and possibly even using multiple branches for tracking oldstable/stable/unstable/experimental work. If upstream is already using git, you might want to skip the tarball step and use their git branches directly (and they might have their own master branch). Also potentially annoying for our downstreams as well. > git push as an upload mechanism > - Attractive because over time it builds a Git repository for the package > - However, it assumes binaryless uploads, which we currently don't allow. This is something to think about for the future though; dropping binary uploads (by maintainers, not buildds) has been on the cards for some time now hasn't it? Is this still planned? > If you're implementing 3.0 format, please don't hard-code the extensions that > you "know" will be found in source packages, because as we add additional > files listed in *.dsc, we may add other types of files. We already found this out the hard way in sbuild; hopefully it's now completely clean--we removed all assumptions about the expected extensions. > What about repository size bloat if revision control history is included? In practice, a shallow clone is typically only half the size of a complete clone, so it's not going to eat too much extra archive space. For schroot: % du -sk schroot-shallow schroot schroot-full 4372 schroot-shallow 7556 schroot [cloned --depth 1 and then fetched all history] 6008 schroot-full % du -sk schroot-shallow/.git schroot/.git schroot-full/.git 1720 schroot-shallow/.git 4904 schroot/.git [cloned --depth 1 and then fetched all history] 3356 schroot-full/.git After repack and gc: % du -sk schroot-shallow/.git schroot/.git schroot-full/.git 1520 schroot-shallow/.git 2920 schroot/.git 2916 schroot-full/.git Packaged .git (after repack and gc): ls -l schroot*.bz2 -rw-r--r-- 1 rleigh rleigh 2765372 Aug 15 21:50 schroot-full.tar.bz2 -rw-r--r-- 1 rleigh rleigh 1403301 Aug 15 21:50 schroot-shallow.tar.bz2 -rw-r--r-- 1 rleigh rleigh 2764894 Aug 15 21:50 schroot.tar.bz2 So a five year history in this case is slightly less than double the packed size--not a bad tradeoff for the entire project history (IMO). Obviously for exceptional cases such as the Linux kernel this might not be quite so optimal. Not sure why there's a size difference if you shallow clone then fetch all, rather than cloning the entire thing--any history differences or just packed slightly differently? > Currently in 3.0 (git), origin points to the bundle and doesn't embed the > actual repository, but Joey is working on fixing that. (Setting origin > based on Vcs-Git.) As I mentioned above, would it make sense to set Vcs-Git based on origin on packing? On unpack after debcheckout the opposite may be useful as you say above. > source.debian.org is working on importing source packages into a Git > repository and storing the history as one revision per new source package > upload. While useful, don't we already have that if you're properly tagging all Debian releases in your git repository already? A central resource would be useful in case the original repos go offline, but given the space requirements, storing all the history should be possible, in which case why not simply track the upstream(s)? Lastly, one thing I'd like to push with git usage in Debian is better integration with upstreams. Rather than repeating it all here, this is detailed in these mails: http://lists.alioth.debian.org/pipermail/vcs-pkg-discuss/2009-August/000663.html http://thread.gmane.org/gmane.comp.sysutils.automake.general/10936 Essentially, *everything* stays in git from upstream to distributed releases to debian work and releases and also to downstreams. There's no import of release tarballs because they are in git too, and there's no pristine tar because the GPG-signed tag of the distribution *is* the release. Currently what an upstream releases as the tarball might not exactly match the release in the VCS (due to autotools bootstrap, other generated files etc.) so here "make dist" actually makes a separate "distribution" branch (as opposed to release) so you have a natural set of branches: development → release → distribution → debian →→ downstream and at each step you have GPG-signed tags giving you an auditable chain of trust along the path. This makes pulling changes from upstream and downstream trivial since you have a contiguous history from end-to-end and the same applies to pushing changes back e.g. with rebasing and format-patch. This is lost if importing tarballs. Here, you can merge back a bugfix from a downstream or debian straight back to the current upstream development branch and submit it in a jiffy. And upstreams/downstreams can directly track what we're doing and cherry-pick changes at well. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
Attachment:
signature.asc
Description: Digital signature