Re: RFC: DEP-14: Recommended layout for Git packaging repositories
On Thu, Nov 13, 2014 at 09:46:13AM +0100, Raphael Hertzog wrote:
> [ I skip the more detailed discussions on naming conventions to
> concentrate on your higher level questions for now ]
Agreed, if we solve the tricky problems, that part is mostly just
yak shaving (and if we can't, it's probably mostly irrelevant ...)
> On Thu, 13 Nov 2014, Ron wrote:
> > Sure, I understood those were your goals.
> >
> > What I haven't seen, and what I'm asking for, is an actual detailed
> > rationale describing the actual detailed problem(s) that you think
> > these goals will be a remedy for.
>
> Problem 1: the derivatives
> --------------------------
>
> So I am a Kali Linux contributor. We use git repos to maintain all our
> packages and we use git-buildpackage.
I guess the first question there is what were the arguments put forward
for deciding to 'standardise' on gbp? If there wasn't one, maybe that's
an argument you should have (and if there was, maybe it's one to revisit :)
If you have a clearer idea now of the problems you are facing, it might
make properly evaluating things that avoid those problems easier.
> Most of the Kali contributors
> are not long-term Debian contributors, I write documentation so that
> they can contribute to Kali (while basing our work on Debian).
>
> To make this manageable I opted to always use a workflow based on
> "git-import-orig". Even when Debian has its own git repo, we start
> from the released source packages because that's the only level of
> uniformity that we can rely on. And it's a pity because if we could
> build on Debian's git repo, it also means that our work would be easier
> to merge for the Debian maintainer.
That's not a totally terrible way to kick off a new repo when there
isn't an existing one. I wrote git-debimport to build a history from
just existing Debian source packages when that's all you have, and
debsnap (in devscripts) originally got written for use with it to
collect the whole set from snapshot when you didn't already have them.
And you can fairly easily splice a repo created with it to a real
upstream one to continue maintenance in a more sensible way from there.
But it is a kind of terrible way to continue maintaining them if there
is a repo. Unfortunately I think that if you make "uniformity" the
overarching consideration here, you've basically doomed yourself to
"failure" from the outset.
Even if everyone did stick to the conventions already discussed
(which in reality, they won't) there's still far too many degrees of
(quite necessary) freedom to really approach a "just follow these
three easy steps" kind of uniformity. And even if you got close to
achieving that for the "debian branches", the (again necessary)
variability between upstream repos is going to be even greater.
I think at some point you're going to have to rely on real human
intelligence to be able to look at a repo and form their own
understanding of its structure. Most really aren't all that
complicated (however much they vary between each other), and in
the worst case you can always actually ask the 'upstream'
maintainer to explain anything that is unclear.
That's going to get annoying really fast if someone new asks me
really basic questions something like that every week. But if you
do this right it's also something that in the worst case someone
should only need to ever ask once, because they can document what
they learned for the next person if there are no 'dedicated'
maintainers for individual packages, and this is a needed thing.
I don't think I've ever had to ask an upstream "how does your repo
work" though. So you possibly could also handle this internally
with a few knowledgeable mentors too.
> They could add the Kali repo as a
> remote (without fearing any conflict in terms of tags, branch names)
> and just merge or cherry-pick as appropriate.
All that said, this part however is not a problem at all. If you've
branched your repo off the 'upstream' one (so you share its history)
then there's never going to be a conflict between branch names.
You can have a Kali repo, cloned from a Debian repo, cloned from
an upstream repo, and *all* of those repos could have their own
separate and distinct branch called "master", and still there would
be no conflict.
git is already going to namespace them so when you add the remotes
the branch refs will be (for remotes named upstream, debian, kali):
remotes/upstream/master
remotes/debian/master
remotes/kali/master
What you name those branches if you check them out locally is totally
up to you. The local names need to be unique, but you can:
$ git checkout -t debian/master -b debian-master
$ git checkout -t kali/master -b master
And that will work just fine.
Tag names aren't quite so forgiving. But realistically, even if
you simply name your tags v$version in all of upstream, debian,
and kali, then if you actually have conflicting names you were
already in deep trouble anyway, because now you have a kali
package with *exactly* the same version as a debian package.
If it really is the 'real' debian package, you have no problem
and don't need a kali tag for it anyway. If it's a kali special,
then the package already ought to have a x.y.x-1kali2 type version
anyway. So this part already naturally avoids conflicts too.
> And we could also build on work in progress that has not yet been
> released as a source package. Right now, the only packages where we
> build on top of the Debian git repositories are some native packages
> (like debian-installer).
I can't really tell you how kali ought to work, but if you can solve
it for this case, you should be able to do the same for non-native
packages too. And I would recommend that you try to, even if you
can't get the "uniformity" that you'd thing of as ideal.
What is the exact difference that makes native packages work for this
but others be harder? In the case of gitpkg, this is really just
the difference between passing one ref and two to export it.
> I can't afford to document all the possible ways Debian is maintaining
> their package but if I can write a documentation that covers the common
> case and if I can tell them "when you see those branches, you can follow
> the instructions below", then we have made some real progress.
I really do think that the names of the branches are actually going to
be the least of your worries here, unfortunately. Even with a naming
scheme that's widely adopted, things just aren't going to be that sort
of uniform outside of (a fairly large number of) fairly small subsets.
You're going to need a better solution than this unfortunately, if you
want it to actually work.
> Problem 2: interoperability between the tools
> ---------------------------------------------
>
> I am part of the Python Modules team who wants to switch to git but not
> all contributors are using the same git helper tools and yet we would like
> to all work together on the same repositories without forcing everybody
> to use the same helper tool (habits are hard to change).
Really the only way this can possibly work is if all the tools that
you consider viable candidates to use *don't* require some special
structure of their own in the repo to work (or knowing the right
magic incantation specific to each repo for them to work correctly).
Otherwise, the tools that do are simply never going to work with a
repo that doesn't have that. There really isn't any way around that,
except to replace those tools with better ones ...
I completely agree with the habits thing. I blame svn for all sorts
of terrible habits that seem to persist perniciously :)
At some point though, you'll probably have to decide if the habits
that seem acceptable in private are habits that really are acceptable
in a group situation. It might pay to remember that a willingness to
fix bad habits *is* what separates the best and most productive
developers from the pack ...
> We can't just let each maintainer use the default layout suggested
> by his preferred helper tool and the defaul tagging scheme, we have to
> define some common layout for the whole team. Then it matters less if
> people are using git-buildpackage or git-dpm or gitpkg.
Again, the tagging and "layout" are going to be the least of your
problems here. Even if you get consistency there, most of these
tools still aren't going to be interchangeable if they make
assumptions about repo content that git itself doesn't enforce.
> It might be awkward at times but at least there is some consistency among
> the team, and the few problems that will arise will be occasions to
> improve the tools.
>
> But we have to define the common layout to use and this discussion should
> hopefully solve this too.
I think the differences you're going to really get stuck on are far
more fundamental than that. I think until you sort those out (or
rule out use of the tools for which this is not reconcilable) it's
premature to worry about layout conventions.
If you solve that, any convention "will do". If you don't, no
convention is going to be able to help.
It's not enough to just say "if all the layouts looked more like
what gbp uses, everything will interop". That's not the thing
that makes it different to them.
> Problem 3: making it easier for new contributors
> -------------------------------------------------
>
> While I can appreciate the versatility of gitpkg, new contributors
> are looking for guidance and clear instructions. It's difficult to give
> those when we have zero common ground on how we manage our git
> repositories within our project.
You should probably go have a play with gitpkg :)
I have a hunch you're going to be shocked at how trivial the
instructions for using it will be. And how much of what you
say you want is going to Just Work out of the box with it.
We talked Jacob through this on IRC a few weeks back. He came from
having read some hideously complicated tutorial for using gbp, and
was bashing his head against the wall trying to figure out how to
do the simple task of being able to manage his debian patches against
his own upstream repo.
The biggest problem he had once we started that, was *unlearning*
that this was a hideously complicated process. It took longer to
explain "no, no, you don't need to do all that stuff anymore" than
it did to explain what he did need to do. He simply couldn't
believe at first that it was all he needed to do.
I'm happy to walk anyone else through that if it's not immediately
intuitive to them and they want to try it. Primacy in learning is
a powerful thing. Training someone to copy a bad habit is a lot
easier than teaching them to break out of it again.
If you're worried about new contributors, this really is something
you should seriously explore in more depth.
> While I don't see us converging on any single helper tool right now,
One thing that might help existing users here, in the group case
you described above, is it's probably quite possible to build some
front ends to gitpkg that mimic the other tools, for people who
simply find that finger memory is hard to break.
Unlike several of them, gitpkg didn't try to be a "framework" for
package management. It simply focussed on the job of exporting a
source package, in any valid source package format, from an existing
git repo. It provides hooks to let you tweak that, and to perform
automated tasks after export (like shipping the package off to be
built), but it explicitly doesn't provide high level functions for
repo management tasks *prior* to export.
Mostly because it's trivial to write those as separate scripts
which call gitpkg once it's time to do the export.
So you might find that it's actually much easier to implement any
functionality from the more troublesome tools that you do find
valuable on top of gitpkg than it is to actually fix the more
fundamental problems those tools themselves ...
> it's important to start taking steps that brings them closer so that
> we can give more useful explanations to newcomers.
> and so that they
> can get started
>
> > Likewise, it's not clear to me that tools other than gitpkg are
> > actually interchangeable, because they weren't designed to be from
> > the outset and rely on magic being committed into the repo to work.
> >
> > I don't really see how some naming conventions can fix that either.
>
> Naming conventions won't fix that but it's still a pre-requisite
> to be able to fix the tools that (unlike gitpkg) voluntarily set (by
> default) more constraints on the expected layout of the repository.
Like I said above, I don't think that's going to get you past their
more fundamental differences.
> > Maybe if you start by detailing the problems, we will be able to
> > see some better solutions that actually achieve your real goals
> > and result in real improvements to the tools that created them.
>
> Let's see!
Well, at the very least I can promise you that if we find real
limiting problems in gitpkg that will be something I'll be keen
to see remedied!
Whether the other tools can be fixed ... yeah. After all, there
was a reason gitpkg was born as a separate thing in the first place :)
> > > Fine if the other tools do not need anything like that. But who knows,
> > > maybe you will want to enhance git-debcherry to not only update
> > > debian/patches/ but also store the corresponding git branch for long-term
> > > storage. In which case, you will already have a recommended tag name
> > > for this purpose :-)
> >
> > Why would you want to do that?
>
> To share them with upstream in a form ready for merge (or more practical
> for review/analysis, etc.).
Ok. Time for me to share a case study :)
I've never really bought the argument that "quilt patches in a debian
package will result in them being more efficiently upstreamed", and
I've never really seen any evidence to contradict that -- but we don't
need to rely on speculation, I have a real example to share!
I fairly recently adopted tftp-hpa after it was orphaned. The previous
git repo was deleted when it was orphaned so I never got a copy of that
and I'm not sure exactly what form it was in, but I believe it was using
tarball imports using gbp, and exporting format 3 packages with a quilt
series for patches to upstream. There were upstream patches languishing
in there dating back ~5 years.
I quickly enough reconstructed a base history with a git-debimport
of snapshot.d.o (as described above), spliced that to the real upstream
git repo, pulled their latest changes, ditched the debian/patches and
turned them all into real commits, exported and uploaded a new package,
and pushed my repo to alioth.
... less than 8 hours later, *all* of those patches had been cherry
picked and merged into the mainline upstream repo.
This is why I think having them as real commits, and working in the
way that real upstreams really work with git, is infinitely more
valuable than having a patch series in either the package or as
duplicated diff-of-a-diff commits to the repo. This is far from
the only upstream that I've had rapidly accept patches in this way.
Now that gitpkg can automatically find and export them as a
debian/patches series for people who want that too, that can be
done at basically "no cost", so there really is no sensible
reason to be handling those separately anymore, even with the
assistance of a "helper tool". Just use git as git was designed
and if you still want them in the package, let the package export
automation do the rest.
There's no better way to share with an upstream using git than
using git in the way that upstream already understands.
(but yes, I know, bad habits die slowly :)
> > A lot of the problems you seem to be worried about here are things that
> > gitpkg designed around ever having from the outset and simply doesn't
> > have. I think if we can raise awareness about those things and fix them
> > in the tools that have them, that would be an awesome thing. I'm less
> > excited at the idea of codifying those limitations as if they were an
> > inevitably necessary thing, as a way to avoid fixing problems in the
> > tools that might have them though. That would just paint us all into
> > a corner that will be even harder to get out of again later.
>
> I understand, though we can certainly set a default naming convention
> without codifying it as limitations to be imposed. My goal is not to
> restrict the workflows, my goal is to standardize the bacic (common)
> concepts and associated branch/tag names and build on that to improve all
> our tools.
Oh, the other one I meant to mention about this the other day and then
forgot, is you also can't rely on "retagging" an upstream branch to
not break things.
An increasing number of upstreams are doing automated version stamping
based on the presence of tags in the repo, and if you lay a new tag on
top of theirs, some proportion of those things will surely break too.
The core problem with making conventions like that though, is that as
soon as anything relies on assumptions that aren't something which git
itself prohibits, it's simply a matter of time before that thing will
explode in your face, on some repo, somewhere.
So if you aren't going to enforce them (which realistically is quite
impossible to do anyway), ultimately the tools need to support
exceptions to them. And as soon as the tools do that, and do that
well, the need for the conventions (beyond the ones that already
exist outside of Debian as common best practices) quickly evaporates.
There really isn't anything special about Debian branches so far as
git is concerned. They're really no different from any other feature
branch in any other repo. It's the trying to make them special, like
they had to be for CVS and SVN, that has been the cause of so much
confusion and overcomplication in this space.
That's the trap that a lot of tools have fallen into. Maybe some of
them can still dig themselves out of it, maybe some of them we'll
find are fundamentally stuck fast in it. But I do believe that gitpkg
is a working proof that this isn't a trap that can't be avoided.
I'd certainly be interested in any feedback you have on what gitpkg
*doesn't* do for you that you find to be an essential feature of gbp.
I'd be quite surprised if any of those things weren't quite easy to
fix with some fairly simple scripting.
Cheers,
Ron
Reply to: