Re: [PATCH] proposed v3 source format using .git.tar.gz
On Sat, Oct 06, 2007 at 10:09:22PM -0400, Joey Hess wrote:
> Colin Watson wrote:
> (So, FWIW, I'm not sold on git. Not sold at all yet. But it was a good
> choice for this implementation for several reasons.)
(I don't think bzr is perfect either, of course; the lack of shallow
branches (see below) is one flaw that's very relevant to this
application. If there were a distributed VCS that were clearly better
than the others in every respect, we'd probably all know about it ...)
> > Still, this work looks pretty cool, and I'd like to be able to make use
> > of it despite avoiding git whenever I can. I noticed that you'd
> > helpfully structured your changes such that it would be possible to plug
> > in a different revision control system, so I wrote a module to support
> > bzr.
> Nice. The FAQ has some questions aimed at adding other revision control
> systems, could you try to answer those in the context of bzr? In
> particular, is the data that would be shipped in the source package the
> same data that bzr normally reads from untrusted sources, thus ensuring
> that using it this way is equally (in)secure as using bzr to pull data
> over the network? (Note that this wasn't 100% true for git and I have
> had to put in several workarounds.)
I believe so; bzr has hooks but AFAICS they're only exposed to plugins
(i.e. code that goes in /usr or in ~/.bazaar/plugins) rather than being
something that lives in the .bzr directory. I don't know of anything
executable in .bzr. I intentionally used 'bzr branch' to create the data
that will be shipped, which is the same command used to branch from a
network repository, so I believe that if there is a security flaw in
this implementation then it would also be a security flaw in bzr itself.
The only things I really needed to tweak were to remove a couple of bits
of metadata which aren't useful in this context: branch-name ended up
with blah.bzr.tar.gz.tmp or something like that in it, and it'll be
detected from the unpacked directory name if it doesn't exist; and
parent is just the directory 'bzr branch' branched from.
> And is the data format stable and/or one that bzr has a history of
> supporting old versions of in a way that ensures backwards
The data format has changed a few times, but so far bzr has an excellent
history of continuing to support old versions. Some data formats (dating
from 0.8 or so) are marked as unsupported and you have to use 'bzr
upgrade' before doing anything else. Everything else at worst nags you
to run 'bzr upgrade'.
I think they may have dropped support for some very old formats that
basically only some early bzr developers used.
> Also, will the bzr repos always contain the full history, or is there
> an equivilant to git shallow clones? How big do they tend to be?
I don't have as comfortable an answer here. There's no equivalent to git
shallow clones yet (only a design, http://bazaar-vcs.org/HistoryHorizon;
so this will probably get fixed one day). At present the .bzr tends if
anything to be a little bigger than the source.
I think due to historical performance issues people tend not to be using
bzr much on very large trees yet, so I'm hoping this won't be an issue
for a while; whereas the git backend has the immediate prospect of
> > * Some source packages want to ship non-VCS-managed files.
> > It's very common for source packages to include autogenerated
> > objects like configure, Makefile.in, etc. Whether to check these
> > into a VCS is a somewhat religious matter (as acknowledged by the
> > gettext info documentation, for instance), and personally I lean
> > towards checking them in (with a few exceptions) just because it
> > makes it easier to see when they change and keep an eye out for
> > oddities, but I know that a lot of developers prefer to keep these
> > outside their VCS. Shipping a working tree would make it easier to
> > handle cases like this.
> Hmm, I hadn't considered that this might be a problem.
> I don't know if I'd want to write the code to do this, but shipping a
> partial working tree consisting of just those files would be enough to
> solve this.
That ought to be relatively straightforward; just list all the files
that the VCS knows about and unlink them. It seemed untidy though. Maybe
put them in a separate directory (.bzr-extra-files or something) which
is copied over after unpack, and make it a dpkg-source -b option rather
than the default behaviour?
FWIW, I was thinking much more of native packages here; non-native
packages already tend to just import the upstream tarball which usually
contains generated files, which is probably why this hasn't been a
problem for things like git-buildpackage. If nothing else, there are
several native packages in the d-i tree alone that don't have configure
et al in Subversion.
Alternatively, if people don't agree with me that we should ship the
working tree by default, maybe it could be an option for the few
packages that need it due to generated objects not in revision control,
so some VCS source packages have a .tar.gz (or .orig.tar.gz + .diff.gz)
but most don't.
> > * Space-constrained mirrors could conceivably exclude the VCS data if
> > they had to, though we probably wouldn't encourage this.
> > These seem to me to be non-trivial advantages that outweigh the space
> > costs of shipping around the working tree.
> The space constraints seem pretty hard to me. Specifically, I don't want
> to piss the ftpmasters off and get vcs source packages banned from the
> archive.. The only saving grace really seems to be that shipping both
> vcs and upstream tar will only double the size of the archive once most
> everything uses the new format, and the archive will have probably
> doubled in size several times over due to other factors before then.
Of course, a number of packages accidentally ship .svn directories and
so on anyway, though I suppose there's a difference between "officially
blessed by dpkg" and "warned against by lintian" ...
Colin Watson [email@example.com]