Re: RFC: DEP-14: Recommended layout for Git packaging repositories

To: Simon McVittie <smcv@debian.org>
Cc: debian-devel@lists.debian.org, Ron <ron@debian.org>, Guido Günther <agx@sigxcpu.org>, "Bernhard R. Link" <brlink@debian.org>
Subject: Re: RFC: DEP-14: Recommended layout for Git packaging repositories
From: Ron <ron@debian.org>
Date: Sun, 16 Nov 2014 21:29:34 +1030
Message-id: <[🔎] 20141116105934.GM10827@hex.shelbyville.oz>
In-reply-to: <[🔎] 54679CDB.4090405@debian.org>
References: <[🔎] 20141111212624.GA19601@home.ouaza.com> <[🔎] 16276435.4WkQ4eyy62@scott-latitude-e6320> <[🔎] 20141112091527.GE27081@home.ouaza.com> <[🔎] 20141112114441.GV8092@hex.shelbyville.oz> <[🔎] 20141112131455.GM27081@home.ouaza.com> <[🔎] 20141112232953.GB10827@hex.shelbyville.oz> <[🔎] 20141113084613.GA1392@home.ouaza.com> <[🔎] 20141113140412.GC10827@hex.shelbyville.oz> <[🔎] 54679CDB.4090405@debian.org>

On Sat, Nov 15, 2014 at 06:35:07PM +0000, Simon McVittie wrote:
> On 13/11/14 14:04, Ron wrote:
> > I really do think that the names of the branches are actually going to
> > be the least of your worries here, unfortunately.  Even with a naming
> > scheme that's widely adopted, things just aren't going to be that sort
> > of uniform outside of (a fairly large number of) fairly small subsets.
> 
> I agree that the expected contents of the branches are far more
> important than their names. Unfortunately, while acting as "the Debian
> expert" for Debian derivatives at $day_job, I keep finding that the
> answer to "OK, I've cloned a package's git repository, I know what code
> change I want, now do I change the upstream source or drop a patch into
> debian/patches or what?" is "... I can't actually answer that until you
> tell me which source package you're working on".

I'm not sure that problem is actually avoidable if a generic set of
instructions is all the person asking that has to work with.  At least
not outside something like pkg-perl where upstream itself is already
very well structured through something like CPAN, and the group that
is maintaining it inside Debian has a very coherent plan (and/or set
of tools).

I mean, if you had to ask *me* that question about one of *my* repos,
I'd *still* have to ask you which source package you were working on :)
Partly because every upstream is different in some small way or another,
making the best way to work with them have some slight variations, and
partly because I think best practice here is still an evolving thing
and so I'm still (after all these years) experimenting with slightly
different things that may (or may not :) work better for a specific
type of problem.

None of them are drastically different from the others, and a bunch of
common best practices have emerged.  But if you have to ask because
you couldn't figure it out for yourself, the sort of answer that you're
probably going to need is going to have some domain specific details
to it.

I do think you might have seeded a useful clue to how we might better
approach this in the other thread though (and sorry for the long answer
there, but I do think background information and details are an
important part of getting everyone on the same page for understanding
the real problems and what sort of solutions might really work here).

> At the moment, I suspect Kali's approach - arbitrarily choosing one of
> the popular approaches, only cloning packaging repositories from Debian
> that happen to match that approach, and restarting a new packaging
> repository for those that do not - is likely to be the only viable
> solution to that. There's always going to be a certain amount of
> re-importing in any case, because some packages in Debian are maintained
> in a non-git VCS or in no VCS at all; but it's easier to inspect history
> if it's possible to clone the existing packaging repository for "most"
> Debian packages of interest.

I think there are two fairly distinct problems there.  At least if I
reasonably understand what Raphael has described of it so far.

In the case he most recently described "we have a relative novice
who just needs to apply a patch from somewhere else, or import a
new upstream tarball" - it's not really clear that doing that work
"manually" from a VCS is the best, or even a good, solution anyway.

In such a case, it's probably much easier to just say "Download
the existing package, modify it as required, and upload it".
And then if the project as a whole really wants that in a VCS,
it can just import that new package to its VCS history automatically
from the uploaded one.

Teaching them how to use dpkg is much simpler and less fraught with
unwanted complication than trying to force them into setting up gbp
and getting its use correct too, just to do something that doesn't
really need it or benefit from it at all.

In the case he originally described "we want people to be able to
apply changes that can easily be pushed back to Debian", the only
viable answer really is to clone the Debian maintainer's repo,
figure out its structure, and commit your patch in a way that
this can really happen (no different to how you or the Debian
maintainer would push a patch to the upstream maintainer if they
are using git).

While I don't think naming conventions for branches or tags can
really help with that at all, I do believe there may be ways that
we can provide some extra "meta data" which could help with this.
But before I can really say what extra data might be helpful, or
even decide if I really believe it wouldn't just also be a major
waste^W inefficient use of effort, I'd really like to see some
*actual* questions from the actual people having actual trouble
with that.

The feedback you gave about where you came unstuck with gitpkg
the first time you encountered it was great.  There's some
actually actionable things we can look at good ways to improve
which might avoid that for others in the future.  If we're just
starting from a hypothetical person though, who hypothetically
doesn't understand where to start, then the problem space is
just way too huge for us to address in any meaningful way or
to identify things we'll live long enough to be able to fix.
The only answer I can really give for that today is probably
"use google, maybe you'll find this thread" :)

Raphael: do you have public discussion of people asking about
these sort of things that you could give us some pointers to?
(and if not, can you get Kali to set up a list where they do
happen that you can point us to interesting bits of when they
occur, so we can actually respond to real problems rather than
trying to imagine them all ourselves)

I'm sure there's a few that other people here can tell us about
from their own experiences to get things started, but this seems
like something we should evolve as a set of living best practices
recommendations based on real input rather than throw over the wall
based on some brainstorming by people who aren't having problems.

> One of my projects for the near future is to put together some simple
> test-cases for packaging - a set of simple projects with a downstream
> patch that gets applied in the next upstream release, a downstream patch
> that doesn't get applied upstream, and a downstream patch that conflicts
> with upstream changes - and try packaging them with each of gbp-pq,
> git-dpm and gitpkg. To have the complete set, I think I need one project
> where the upstream tarball is a simple git-archive of the upstream git
> repository, one where the upstream tarball has extra detritus (e.g.
> Autotools) and/or missing files (upstream's .gitignore not being in the
> tarball is also common in Autotools), and one where the Debian
> maintainer needs to filter out a non-free file.

That sounds awesome.  I'm definitely interested in being kept in the
loop about any problems that shake out of that.  For git-debcherry
in particular we went through a whole bunch of "what if" scenarios
(and redesigned and rewrote it a whole bunch of times based on the
ones that shone a light on some awkward corner case before we ran
out of those and unleashed it on other real users) but it's nearly
impossible to be exhaustive about that, so more people coming up
with more of them independently can only be a good thing.

> Anything else you can think of?

I have projects where I export orig tarballs directly from the
upstream branch (and never import them), either because I am
upstream and so those become the official orig too, or because
the upstream tarballs uploaded elsewhere aren't signed and/or
contain junk we don't care about (which may or may not be
non-free).

In some of those cases I use .gitattributes to simply export-ignore
the things we don't care about (so they are in the repo, but not
in the tarballs).

I have some where the upstream branch does not contain autotools
generated files but for which I create an integration branch
that does commit those for each release (and which the .orig is
exported from) - to keep a record of a known working autotools
version that can be diff'd against when some new version of
autotools Breaks Stuff (less common in the last few years, but
I've been burned often enough by it that it's cheap insurance).

(Someone mentioned pre-built doxygen docs too, which I've personally
never cached that way, but which I could see the temptation to given
how different versions of doxygen can create terrible docs from the
same source and the markup almost always needs modification for
newer versions of it, which doesn't always work with older ones)

I have a couple where both upstream and debian patches all go
onto just one single branch, either because I am upstream and
a new debian revision is never released without also releasing
a new upstream tarball, or because they got converted from CVS
that way and I've just been too lazy^W^W^W never seen the need
to split the branches since then.  Or because they are native
packages.  gitpkg has an option to create a "fake orig" from
such a layout if the package isn't native (which just removes
the content of /debian from the orig that is exported).

I've a feeling I'm forgetting one or two other weird configurations
that exist for some reason or another too, but the "export directly
from upstream's branch" is probably the most generally interesting
one.  The other reason I use that one is for releasing snapshots
of the upstream repo where some important patch(es) are in, and the
branch up to that point is sane enough to be released, but hasn't
actually been tagged as a formal new upstream release.

  Ron

Reply to:

References:
- RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Raphael Hertzog <hertzog@debian.org>
- Re: RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Scott Kitterman <debian@kitterman.com>
- Re: RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Raphael Hertzog <hertzog@debian.org>
- Re: RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Ron <ron@debian.org>
- Re: RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Raphael Hertzog <hertzog@debian.org>
- Re: RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Ron <ron@debian.org>
- Re: RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Raphael Hertzog <hertzog@debian.org>
- Re: RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Ron <ron@debian.org>
- Re: RFC: DEP-14: Recommended layout for Git packaging repositories
  - From: Simon McVittie <smcv@debian.org>

Prev by Date: Re: RFC: DEP-14: Recommended layout for Git packaging repositories
Next by Date: Bug#769767: ITP: imageio -- Library for reading and writing a wide range of image formats
Previous by thread: Re: RFC: DEP-14: Recommended layout for Git packaging repositories
Next by thread: Re: RFC: DEP-14: Recommended layout for Git packaging repositories
Index(es):
- Date
- Thread