Re: [ANNOUNCE] git-series: track changes to a patch series over time

To: Ian Jackson <ijackson@chiark.greenend.org.uk>
Cc: debian-devel@lists.debian.org
Subject: Re: [ANNOUNCE] git-series: track changes to a patch series over time
From: Josh Triplett <josh@joshtriplett.org>
Date: Fri, 12 Aug 2016 04:05:05 -1000
Message-id: <[🔎] 20160812140505.nztdjzpa5nzwppxg@x>
In-reply-to: <[🔎] 22445.45989.114882.318467@chiark.greenend.org.uk>
References: <20160729065717.GA25788@x> <[🔎] 22444.27780.310164.556317@chiark.greenend.org.uk> <[🔎] 20160812035657.c72mu75x5bmygvva@x> <[🔎] 22445.45989.114882.318467@chiark.greenend.org.uk>

On Fri, Aug 12, 2016 at 12:31:49PM +0100, Ian Jackson wrote:
> Josh Triplett writes ("Re: [ANNOUNCE] git-series: track changes to a patch series over time"):
> > On Thu, Aug 11, 2016 at 01:16:04PM +0100, Ian Jackson wrote:
> > > My biggest question therefore is: how does your tool compare to
> > > stgit ?  Why should we use your tool rather than stgit ?
> > 
> > While stgit does track the history of changes made to the stack, as far
> > as I can tell, it doesn't do so in a manner meant for interchange
> > between users.  stgit works locally for one user, but doesn't seem to
> > support multiple users.  And the history of the patch stack doesn't
> > include commit messages, nor does it group changes into logical commits.
> > It seems more like the reflog (a tool to rescue old bits) than a
> > historical record.
> 
> I don't understand what distinction there is between multiple users
> and multiple development efforts by the same user.  Or maybe I don't
> understand what you mean by `support multiple users'.

I mean multiple users collaborating on a single patch series in a
distributed way, or even a single user developing the patch series and
multiple users wanting to consume and view the history of that series,
or a single user developing a patch series and using more than one
development system to do so.

stg can publish/format the patches themselves, but doesn't have a
documented format for publishing the history of the patch series.  It
keeps some internal records of that history, but those records don't
have the concept of commit messages, well-defined points of
working-ness, or anything you'd expect from a version control history.
Certainly not something you'd want to publish.

Hence my comparison to reflog.  If you did enough digging, you could
probably rescue old versions of the patch series, just as you can get
rebased-away heads from reflog for a while, but you'd have to do a lot
of sifting through whatever random intermediate steps of patch shuffling
you might have done.

> stg publish seems to be the tool you use for sharing stg branches.

Not as far as I can tell.  stg publish seems write-only (with no
symmetric command to recover the full patch series, let alone its
history), and focused exclusively on producing a fast-forwarding branch
that has the correct tree, not on preserving the exact state *and*
history of the patch series.

For instance, if you delete or edit a patch from the series, stg publish
(as documented at https://stgit.org/stg-publish.html) "creates a new
commit on the public branch having the same tree as the stack but the
public head as its parent".  How would you turn the result back into a
patch series?  You'd have to do some manual archaeology, made more
complex if you've made any other changes to the series at the same time.
And none of that published history (with or without transformation)
would work to push to the upstream project, or for the upstream project
to "git pull", or even to "git format-patch" into a patch series.

> NB I'm not much of an stg user.

I used it a long time ago, but I haven't used it in years.

> > > My next question is: how do you handle merging of changes made in
> > > parallel in different meta-branches of the same series ?  I don't mean
> > > just aggregating patches, but other common operations such as:
> > > reordering of patches; editing patch commit messages (or the cover
> > > letter); splitting and merging patches; git rebase --autosquash; etc.
> > > 
> > > I didn't see anything in the docs about this.  And I confess I didn't
> > > run your code to do any experiments.
> > 
> > git-series does support merge commits within the series branch; see the
> > section "git-series commits" in INTERNALS.  Right now, git-series
> > doesn't create those merge commits for you, but I plan to add a
> > mechanism to support that.  That'll probably start out as "here's two
> > patch series, tell me when you've finished creating the merged version
> > and I'll commit it", though I could imagine handling many simple cases
> > more automatically.  I hope that building a simple tool and
> > incrementally improving it will work.
> 
> I think this is the key area of difficulty which stops people sharing
> patch series as much as they like, at least as much as the lack of a
> fast-forwarding view.

Agreed.  Having a format to store and interchange patch series history
seems like a necessary first step.  Making it possible to store merges
seems like another.  Making it possible to *produce* them would help
further.

One step at a time, though. :)

> > > I did read the INTERNALS document about the data structures.  I wonder
> > > why you rejected other possibilities.  In particular, your top level
> > > `git series' branch data structure is not directly useable by any
> > > other tool; it needs to be dereferenced/converted, to produce a
> > > useable commit.  Did you consider recording the metadata as dotfiles
> > > in tree objects, or some such ?
> > 
> > I started with a few fundamental constraints:
> > - The commits tracked by the series *must* remain directly usable as
> >   commits in the underlying project, whether by sending patches or by
> >   pushing/pulling.
> > - git must find every object in the history of a series reachable from a
> >   ref, so that fsck/repack/prune/etc cannot discard series history.
> > - Similarly, `git push` and `git fetch` must work on series commits, and
> >   must transmit/receive the full series history with a series branch,
> >   without requiring any additional commands or special "series" versions
> >   of push/fetch.
> > 
> > These constraints limit where metadata can live.  Adding any dotfiles to
> > the commits in the patch series would mean the resulting patches would
> > include those dotfiles.  Any metadata added to commit messages would end
> > up in patches; note that several projects, including the Linux kernel,
> > have complained about patches that include Gerrit "Change-Id" tags.  Any
> > format that stored patches within a series commit, rather than full
> > links to commits for the patches, would not leave the commits themselves
> > usable by git.
> 
> The usual approach taken by other patch stack tools is to treat picky
> upstreams, like people who object to Change-Id, as an output format.
> 
> Those picky upstreams are likely to rewrite (or reapply) a series, so
> what ends up in the upstream tree won't be the same commit objects
> (and perhaps not the same tree objects) anyway.

As mentioned above, I specifically started from the constraint that
the series tracks commits, not just trees; the commits themselves,
complete with their commit IDs, represent one of the artifacts the
series tracks untouched.

While email workflows effectively turn into "git am" which amounts to a
rebase (though `git format-patch` has started to fix that with the
introduction of base-commit metadata), "please pull" workflows typically
involve merges of your actual commits.  (That includes "git
request-pull", "git series req", or a GitHub/GitLab/etc pull request.)

> > Do you see another possible storage format that meets all the
> > constraints above?
> 
> Well, there is the obvious "pseudo-merge" convention: each patch
> series tip is, when published, merged with -s ours with the previous
> published version.
> 
> You do have to strip the pseudo-merge before starting work with
> git-rebase, and then reapply it afterwards, but that is not
> particularly difficult (and some tooling would help).

Not just git-rebase; that approach requires using special tools around
any git command that operates on the actual patch series.  You can't
just run "git cherry-pick", "git rebase", "git rebase -i", "git commit
--amend", "git am", or even just "git commit" without first un-applying
the merge commit.  Analysis tools like "git bisect" or "git blame" will
also find themselves unhappy.)

Pulling the commit out of a pseudo-merge doesn't seem any less or more
difficult than pulling out the "series" entry from a git-series commit;
either way, you have a wrapping commit for metadata from which you need
to extract the underlying commit you want.  And the "pseudo-merge"
convention doesn't track a cover letter, the base of a series, or any
other metadata.

In addition, a "pseudo-merge" encodes non-trivial metadata into the
parent list of commits, making it more difficult to handle things like
merges between meta-commits, or conversely the history of commits that
themselves include merges.  How can you follow the history of such a
patch series, and tell the difference between meta-commits and commits?

Some of those problems seem fixable; you could define a precise format
based on pseudo-merge commits, including all the same metadata, a
precise definition for which parents refer to other pseudo-merges and
which ones refer to versions of the series, and so on.  I don't see the
advantage of such a format, though.

> I intend to provide some tooling support this workflow, because I
> think this workflow would work well with dgit.  It produces a
> fast-forwarding branch containing the intended output tree objects.

"tree objects" alone don't suffice.  They provide enough information to
extract a source tree, but then, so do archived source tarballs.  That
doesn't suffice for collaborative development processes anymore; for
that, you need commit objects.

> Series cover letters are less important for Debian so I haven't
> thought about that much but the obvious answer is to have an "empty"
> commit at the base of the stack.

Git really despises empty commits and does its best to destroy them at
every turn.  (This partly comes from trying to drop already-applied
commits when rebasing on a newer upstream.)  I would not recommend
putting any data you value into an empty commit.

In addition, this would make the patch series completely unusable with a
"git pull" or "git push" workflow, since upstream would not want that
empty commit in its history; you could *only* send such a patch series
via email.

> > For repositories, you can push the series branch directly if you want to
> > provide the history of your series, or you can push the current version
> > (or an older version) of the patch series if you just want to publish
> > that version.
> 
> Neither of these is compatible with dgit, of course.

The former seems the easiest to interoperate with: dgit could easily
learn to receive a series commit and turn it into a source package ready
for upload.

In addition, the git-series format has the advantage of chaining back to
the upstream git history; the "base" for each series commit would refer
to an upstream commit, and then the "series" provides a patch series on
top of that.  Most git repositories for packaging end up using unusual
mechanisms to interoperate with upstream version history, or they don't
do so at all.

Reply to:

Follow-Ups:
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- dgit vs git-series
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

References:
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Josh Triplett <josh@joshtriplett.org>
- Re: [ANNOUNCE] git-series: track changes to a patch series over time
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

Prev by Date: Re: use long keyid-format in gpg.conf (Re: Key collisions in the wild
Next by Date: Re: [ANNOUNCE] git-series: track changes to a patch series over time
Previous by thread: Re: [ANNOUNCE] git-series: track changes to a patch series over time
Next by thread: Re: [ANNOUNCE] git-series: track changes to a patch series over time
Index(es):
- Date
- Thread