Re: [ANNOUNCE] git-series: track changes to a patch series over time

To: Ian Jackson <ijackson@chiark.greenend.org.uk>
Cc: debian-devel@lists.debian.org
Subject: Re: [ANNOUNCE] git-series: track changes to a patch series over time
From: Josh Triplett <josh@joshtriplett.org>
Date: Fri, 12 Aug 2016 12:46:54 -1000
Message-id: <[🔎] 20160812224654.7kr27lauqijw5rr6@x>
In-reply-to: <[🔎] 22445.56506.533307.776114@chiark.greenend.org.uk>
References: <20160729065717.GA25788@x> <[🔎] 22444.27780.310164.556317@chiark.greenend.org.uk> <[🔎] 20160812035657.c72mu75x5bmygvva@x> <[🔎] 22445.45989.114882.318467@chiark.greenend.org.uk> <[🔎] 20160812140505.nztdjzpa5nzwppxg@x> <[🔎] 22445.56506.533307.776114@chiark.greenend.org.uk>

First, a summary: I think we've both set out to solve different
problems, and have built a format that meets the requirements we
defined.  We have different requirements, and thus different solutions;
neither of our solutions meets the combined set of requirements, and in
fact I think some of the requirements may directly contract each other.
That seems fine.

On Fri, Aug 12, 2016 at 03:27:06PM +0100, Ian Jackson wrote:
> Josh Triplett writes ("Re: [ANNOUNCE] git-series: track changes to a patch series over time"):
> > On Fri, Aug 12, 2016 at 12:31:49PM +0100, Ian Jackson wrote:
> > > Josh Triplett writes ("Re: [ANNOUNCE] git-series: track changes to a patch series over time"):
> > > > Do you see another possible storage format that meets all the
> > > > constraints above?
> > > 
> > > Well, there is the obvious "pseudo-merge" convention: each patch
> > > series tip is, when published, merged with -s ours with the previous
> > > published version.
> > > 
> > > You do have to strip the pseudo-merge before starting work with
> > > git-rebase, and then reapply it afterwards, but that is not
> > > particularly difficult (and some tooling would help).
> > 
> > Not just git-rebase; that approach requires using special tools around
> > any git command that operates on the actual patch series.
> 
> It does require some additional tooling, but it's quite modest.  The
> task is simple enough to be done by hand on an ad-hoc basis.

Meaning most single git commands to manipulate the repository become
non-trivial sets of git commands with boilerplate.  The very *first*
task that seems critical to make that workflow usable is automating away
that boilerplate with tooling.  And tooling needs to be able to work
with that history in an automated and reliable fashion.

> >  You can't
> > just run "git cherry-pick", "git rebase", "git rebase -i", "git commit
> > --amend", "git am", or even just "git commit" without first un-applying
> > the merge commit.  Analysis tools like "git bisect" or "git blame" will
> > also find themselves unhappy.)
> 
> This is not true.
[...]
> git operations which do not rewrite history and do not introduce
> additional parents, in particular cherry-pick and commit, work
> correctly and produce a new commit on top of the pseudo-merge.

That doesn't produce the correct history; it leaves the pseudo-merge in
the history of the new object, requiring subsequent modification of
history to produce the correct commit history.

In addition, based on your description, how would you actually
distinguish pseudo-merges from other merge commits in the history, given
that the user can make any arbitrary commit on top of it, including a
merge commit?

This workflow only seems viable if you consider commit history and
commit IDs unimportant, and only care about tree objects.

> > In addition, a "pseudo-merge" encodes non-trivial metadata into the
> > parent list of commits, making it more difficult to handle things like
> > merges between meta-commits, or conversely the history of commits that
> > themselves include merges.  How can you follow the history of such a
> > patch series, and tell the difference between meta-commits and commits?
> 
> In practice this does not seem to be a problem.  git-dpm uses a
> similar representation.

As far as I can tell, the git-dpm representation depends on constraining
the type of history it can represent: it expects a linear string of
patches for each version, and AFAICT a linear string of meta-history.
That seems somewhat defensible in the context of Debian package
management, though it wouldn't cope too well with asynchronous NMUs
based on the last uploaded version if further development has occurred
in the repository.

> > Some of those problems seem fixable; you could define a precise format
> > based on pseudo-merge commits, including all the same metadata, a
> > precise definition for which parents refer to other pseudo-merges and
> > which ones refer to versions of the series, and so on.  I don't see the
> > advantage of such a format, though.
> 
> The advantage is that the interchange format between users of the
> pseudo-merge and patch stack management tool is directly useable by
> naive consumers of git histories.

Either way the history you have is not the history you want without
further processing.  With the pseudo-merge format, only the tree is
directly usable.

> > > I intend to provide some tooling support this workflow, because I
> > > think this workflow would work well with dgit.  It produces a
> > > fast-forwarding branch containing the intended output tree objects.
> > 
> > "tree objects" alone don't suffice.  They provide enough information to
> > extract a source tree, but then, so do archived source tarballs.  That
> > doesn't suffice for collaborative development processes anymore; for
> > that, you need commit objects.
> 
> I think you have misunderstood me.  I meant: unlike your `git series',
> my approach produces a fast forwarding branch whose commits contain
> the intended actual tree objects for the program source code.

I understood you, I just disagree.  I consider the tree objects
insufficient, and a representation that only preserves trees but not
commit history and commit IDs does not solve the problem I set out to
solve.

> > > Series cover letters are less important for Debian so I haven't
> > > thought about that much but the obvious answer is to have an "empty"
> > > commit at the base of the stack.
> > 
> > Git really despises empty commits and does its best to destroy them at
> > every turn.  (This partly comes from trying to drop already-applied
> > commits when rebasing on a newer upstream.)  I would not recommend
> > putting any data you value into an empty commit.
> > 
> > In addition, this would make the patch series completely unusable with a
> > "git pull" or "git push" workflow, since upstream would not want that
> > empty commit in its history; you could *only* send such a patch series
> > via email.
> 
> In my view, the `git pull' format is an output format.

I think that represents one of several fundamental difference here, yes.
In my view, anything *but* a hash-preserving git commit history is a
(lossy) output format.

Reply to:

References:
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Josh Triplett <josh@joshtriplett.org>
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Josh Triplett <josh@joshtriplett.org>
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

Prev by Date: Re: dgit vs git-series
Next by Date: Bug#834198: ITP: python-oxd -- Python bindings for Gluu OxD server
Previous by thread: Re: [ANNOUNCE] git-series: track changes to a patch series over time
Next by thread: dgit vs git-series
Index(es):
- Date
- Thread