[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC: DEP-14: Recommended layout for Git packaging repositories



On Sat, Nov 15, 2014 at 06:15:33PM +0000, Simon McVittie wrote:
> On 12/11/14 22:07, Ron wrote:
> > I am also interested to hear more
> > about whatever the confusion was you had with this was when you
> > started working with Tollef's systemd repo that you mentioned
> > in the previous thread.
> 
> Having played with gitpkg some more, I'm reminded that the answer to
> this is that unlike (AIUI) both gbp-pq and git-dpm, it did not meet my
> assumption that the contents of the git tree were in a suitable form to
> run dpkg-buildpackage and have a 3.0 (quilt) Debian package fall out. I
> realise that's partly a property of 3.0 (quilt).

Ah, yes.  That probably is something we could document a bit better.
I guess it got overlooked, partly because there was no format 3 when
gitpkg was first written (and this does work just fine for format 1,
or the single patch style of format 3), and partly because by the
time this was an issue the common workflow was probably not to build
packages in the (potentially dirty) working tree anyway, so it took
someone new coming along to notice it.  This is actually the first
time anyone has mentioned tripping over this to me.

You should be guaranteed to get a functionally correct package if you
do this still, but not necessarily a source package with the /patches
split out individually.  However ...


> For gitpkg, you can commit in the normal git way, but the cost is that
> you have to build in a way that isn't the normal dpkg thing (exporting
> with gitpkg and building the result).

... if you want to build in the local system like this (as opposed
to throwing them off to a buildd chroot), you can enable the dpkg-bp
hook, at which point something like `gitpkg master` will do the
same things as dpkg-bp in the local tree with the added advantage
of also respecting your .gitattributes et al. and ignoring any
dirty state you might have in the working tree.

Which on the one hand, if you know this, is just as easy, if not
easier to do, and gives you much stronger guarantees about what
you're actually building -- but on the other, if you don't, is
something you need to become aware of somehow.

The latter of which is probably somewhat unavoidable for just about
any tool if you're actually planning to push your changes to the
repo and not stuff up later users of the tool - but I agree this
is another thing it's worth thinking more about to make things
easier for more casual users.


Do you have any suggestions for something that might have made this
more immediately obvious to you?  I can think of a few things we
could do, but I really do believe in designing things around actual
user experiences rather than trying to guess blindly about problems
nobody has actually ever had.

With the new tools for exporting a patch series, it would be possible
to export one into the (henceforth) 'dirty' working tree and build
directly in it with dpkg-bp as you tried.  And for people who really
wanted to, it would even be possible to commit that to a branch where
a 'naive' checkout would work as you expected here -- but avoiding
that kind of cruft in the repos was sort of exactly the reason for
writing those tools in the first place :)  It's generated source,
the same as autotools, it doesn't need to be in the VCS unless you
really have some personal reason to want it duplicated there.

It's not clear to me that any of the above is a better general
solution than just getting people on the "right track" for how
things work best to begin with - but I mention them because all
of those things are possible if you know you can do them.

I can think of some other far more "clever" magic that we could do
to simply make this work for a naive user that just does git clone
&& dpkg-bp -- but I don't think I'll mention it, because this is
the internet, and Poe's Law means someone might go "Wow!  That's a
Great Idea, I'm going to do that too" and then it would be My Fault :)


I suspect the Best Answer is documenting this somehow and somewhere,
but it's not clear to me yet where the best place for that would have
been to have avoided what happened to you.

Maybe a README.DebianFromGit or something, that's export-ignored
so it doesn't clutter the package but is seen in a cloned repo?
I can see a few ways that could be less than ideal too though.


> gbp-pq and git-dpm are the other way round: the tree can be built with
> dpkg-buildpackage, but the cost is that you have to commit in a way that
> isn't the normal git thing (either using a specific tool, or for the
> gbp-pq layout, dropping in pre-prepared patches and hoping they don't
> have conflicts, in the same way you might for svn-buildpackage).
> 
> I think I was also thrown by the fact that gitpkg does not encapsulate
> its configuration in what you commit: if two developers build the same
> tree, the debdiff might well be rather large, because one developer's
> .git/config results in separate git-debcherry patches and the other's
> .git/config results in a single large patch.
> 
> git-buildpackage reads both debian/gbp.conf and .git/gbp.conf, with the
> latter taking precedence. That lets maintainers provide "executable
> documentation, in debian/gbp.conf, for "here is how I intend this repo
> to be used", which seems like something that could be rather useful for
> gitpkg: for instance, filter patterns for non-DFSG tarball imports can
> go in debian/gbp.conf as a way to avoid mistakes.

Yeah, there's a whole bunch of tradeoffs in that which we explored in
the very early development of gitpkg.  Since working in git was still
a very new thing to do (both for upstream developers and debian), the
questions of what Best Practice (or even common practice) would be
were still very open ended.  It was quickly apparent that being unable
to cope with anything that that was legal to do in git was going to be
a fatal flaw in the tool, sooner if not later.

It was also fairly quickly apparent that no one workflow or tool was
ever going to achieve unquestioned global domination over all others
so it was really important that repos managed with gitpkg didn't
somehow depend on it to remain useful into an uncertain and infinite
future.  It couldn't impose its own forced framework structure on them,
it shouldn't make them difficult to use with any other ostensibly sane
tool that did a similar thing, and because it should be able to export
packages from *any* even half-sane repo, including package versions
that existed before gitpkg was ever written, and repos that were
created without any knowledge that it did exist -- it also couldn't
rely on having tool specific config *in* the repo for the version
that you wanted to export.

In the same way that Debian source packages were designed to not
strictly require dpkg to be able to extract and build the source,
gitpkg was designed so that it would not be strictly required to
extract a source package from the repo.  It was just a shortcut
you could use to make your life much easier if you did actually
have it available.

Since many of the packages that I wanted to move to git were things
that I was also upstream for, in one degree or another, it was also
vitally important that it didn't get in the way of using the power
of git to its fullest extent, to make the work I needed to do as
efficient and painless as possible.  I don't tend to think of
'packaging' as some task that's kind of isolated from upstream
(or worse, something upstream should keep their sticky little
fingers right out of) -- it's really just an ordinary part of the
normal software development process.  It's a feature added to the
upstream source the same as any other, and can and should be
managed in exactly the same way.  And git was designed from the
ground up to know how to do that, and do it really, really, well.

With the advent of distributed VCS, that really became something
that *everybody* could do.  You didn't need commit access to the
upstream repo, you could just clone it and develop your feature
branch of it.  With cvs-bp and svn-bp, unless you were upstream,
you *had* to import tarballs or similar.  With git, it was quite
clear that working like that was going to be a short term anomaly,
not the long term rule - so a tool modelled on those might have
some short term familiarity advantages, and be an easy thing to
whip up fast, but it wasn't using the feathers git had grown to
actually take flight and soar into a new better way of working.

To segue back to how this is relevant to your original point from that
little side trip down memory lane though (: gitpkg did originally allow
some configuration from files inside the repo (and still actually does)
but one of the things that next became quickly evident about building
packages out of a DVCS was the question of Trust.  Since anyone can now
become a new upstream just by putting a clone of their repo somewhere
public, it's equally important to be able to export a source package
from it, that you can then debdiff or otherwise audit, *without* letting
the content of that repo execute or influence things in any way that
you haven't explicitly allowed it to.

A tool that executes code from a random repo just in the act of getting
the source out of it is a loose cannon in a truly distributed world.
Which reinforced the importance of the decision to avoid needing some
special content in the repo for the tool to be able to work correctly.


Before format 3 got invented, this was actually really easy to avoid
and only the weirdest possible cases might have had any issue with it.
The documentation of the config explicitly warns that depending on it
to get a correct source package is probably a sign that something about
your workflow really isn't quite right and you should probably think a
bit harder about what you're really trying to do.  Most of the useful
config options were just doing things with the package after it was
built, they didn't change the exported form or content in any way, so
they really were purely "local user preferences".

With the advent of format 3 (and to a slightly lesser extent p-t),
this did get more unfortunately complicated though.  If you are using
it, then you have the problem you described above.  We were fairly
careful about ensuring that you'll still get "a" correct package
regardless of your local configuration (it will contain exactly the
same source once patched and build exactly the same binaries), but
you may not get an identical source package to what someone else
does in that format [since format 3 allows the "same" source to be
packed in a potentially infinite number of different ways that all
extract to the same thing, this is basically just a subset of that
problem inherent in its design].

In one line of thinking, this is "a feature" (a local user can get
exactly the kind of source package *they* want, regardless of what
you prefer - if the maintainer doesn't care about patch series and
just uses the single patch mode, you can still export one with a
properly split up series if you want it), but I do agree that in
another dimension, the uncertainty of which style the "original
maintainer" used and prefers could be an issue, and at the very
least that is information that may currently be getting 'lost'.

There is a certain innate tension between "you can use gitpkg on
any repo because it doesn't need to be modified to do that" and
"how do I know if a repo was using gitpkg because nothing in it
was modified to tell me that" :)  Which isn't a problem if you're
a habitual gitpkg user anyway, but doesn't give you many clues
if you're coming from some other school.

In the latest gitpkg release (which is in Jessie), we've added an
examples/README.debcherry-export as a template that people might
like to include in package repos where they are using this, but it's
fairly new, so it wouldn't have helped you back then, and there are
still a bunch of open questions about how the best way to handle
this will really be.


I'd still like to avoid *depending* on magic files in the repo for
things to be able to work correctly, but that is a separate question
from whether there is *something* we can add to the repo that makes
life for someone looking at a clone of it for the first time a bit
easier than it currently might be.

That does seem like something this document might be able to address.
However long it's been since we truly missed the boat for naming
conventions to be of much real use at all, I think there is scope for
having a standard place to look that summarises the things it would
be useful to know about what's in the repo, how to get it out and
into a source package, and what to be aware of if you're modifying it.

I'd have almost surely already added something like that to my existing
repos if it had ever been a FAQ from people trying to use them, but
since nobody has ever actually asked that before your feedback here,
it seemed like a "blind premature optimisation" problem :)  That would
be less of a problem if there was a common place we could reasonably
expect that people from the future would know to look for it, and what
it ought to cover that isn't "self-evident" or "common knowledge" that
should be documented more generally elsewhere.


  Cheers,
  Ron



Reply to: