Re: [PATCH] proposed v3 source format using .git.tar.gz
On Sat, Oct 06, 2007 at 10:37:48PM +0000, Colin Watson wrote:
> On Fri, Oct 05, 2007 at 07:16:13PM -0400, Joey Hess wrote:
> > I've been working on making dpkg-source support a new source package
> > format based upon git. The idea is that a source package has only a
> > .dsc and a .git.tar.gz, which is just a git repo.
> Still, this work looks pretty cool, and I'd like to be able to make use
> of it despite avoiding git whenever I can. I noticed that you'd
> helpfully structured your changes such that it would be possible to plug
> in a different revision control system, so I wrote a module to support
> bzr. The patch is attached to this e-mail, and I'd appreciate comments;
> if this work is merged into dpkg I'd be very happy if my addition were
> merged too. There are probably some improvements to be made, but it was
> really utterly trivial; I was impressed that I didn't have to touch
> anything else beyond plugging in a new module. Ironically, of course, I
> did use git to create it. :-)
I guess if we use Joey's idea at all we will not be able to avoid
shipping such a module for each distributed VCS, and I didn't get
the impression that Joey thought otherwise. So I find your mail
strangely defensive :)
The code itself looks good AFAICT.
> While working on this I was thinking about general issues with the
> format. It seems to me that it's suboptimal not to ship a working tree.
> I know you sort of address this in the wiki FAQ, and I realise that
> there are space advantages to only shipping the VCS data. However, I'd
> like to try to persuade you otherwise if I can. My concerns are:
Shipping the worktree essentially means defining this new format as
an optional add-on, since you ship all the data you ship now plus some
VCS metadata. So all packages will have to be bigger than there
are now (aside from using other compression methods than gzip, and
after really building some packages today with my dpkg-source -C patch
I have to say I'm impressed how much space we might be able to save -
with high CPU costs, though). This is not really an argument for either
side, just wanted to make this effect clean.
> * Users will need to have the VCS installed in order to inspect the
> * Buildds will need to have the VCS installed in their base system.
> * Some source packages want to ship non-VCS-managed files.
Is the last one really such a big problem in Debian? I know that many upstream
VCS don't contain autogenerated files but most .orig.tar.gz's already
contain them today, so I would have guessed people either only have
their debian/ in their Debian VCS or all upstream files from the
> There are two obvious modifications to Joey's proposal that would allow
> shipping a working tree. The first is just to include the working tree
> in the .$VCS.tar.gz object. This has the advantage of being trivial to
> implement on top of the current code: the git module would need to do a
> 'git checkout' after copying the .git, and the bzr module just wouldn't
> call 'bzr remove-tree'.
This would be a bad idea IMHO, and like a regression: instead of
shipping a .orig.tar+diff we now ship one, monolithic (bigger) tarball?
Sounds suboptimal. I'm pretty sure I don't want to see this one
implemented in dpkg-dev.
> The second possibility seems to me to be more flexible, though, and
> probably not all that hard to implement: build both a .tar.gz
> (containing the working tree) and a .$VCS.tar.gz, and teach 'dpkg-source
> -x' to unpack the tree given at least one of these. This would allow
> various interesting possibilities such as:
Since you're essentially demoting the new format to an add-on, why not
just make it really one and just ship a real Format: 1.0 package
(i.e. orig-tar+diff or native-tar) instead of this half-half-working-tree-tarball.
> These seem to me to be non-trivial advantages that outweigh the space
> costs of shipping around the working tree. I'd be willing to have a go
> at implementing this once I've had a bit more sleep.
> Does any of this make sense?
I guess there are two aspects to Joey's proposal:
1) Make the source package more useful by including VCS metadata like
2) Make is easier to include arbitrary changes to the upstream sources
by using more advanced tools than diff/patch, i.e. a DVCS
By concentrating on the first point and making it optional you either have
to sacrifice point 2 by reusing the old source package (orig+diff) or give
people who choose not to download the vcs data a worse experience by
making it harder for them to find the actual diff (working tree tar).
On second thought you can reduce the regression by adding a pristine-gz
delta to the working tree so that you can split the working tree tarball
back into a orig+diff.
On third thought who says you have to fall back to Format 1.0 for the
non-VCS data? You could also fall back to Format 2.0 which would make
preserving advantage 2 easier.
So, no idea if my ramblings made any sense, either ;)
Frank Lichtenheld <email@example.com>