Re: [RFC] General Resolution to deploy tag2upload
Ansgar, Joerg,
Discussion has died down without a resolution of our impasse, but Ian
sent a very long message, so perhaps you are working through it.
Could you let me know if you are still working on further responses, and
if so, roughly how long you think you need?
Thanks.
On Sat 22 Jun 2024 at 01:18pm +01, Ian Jackson wrote:
> Russ Allbery writes ("Re: [RFC] General Resolution to deploy tag2upload"):
>> So yes, you're right, the git-debrebase example is not nearly as
>> interesting as I had thought because the tooling works differently than I
>> had realized.
>
> As ever, it's all more complicated than you thought (and than you now
> think). I'm going to give just a few examples of the frantic paddling
> that dgit is doing underneath the waterline. This is therefore an
> *extremely* long message.
>
> First, though, I want to summarise:
>
> In this message I discuss in some detail five packaging workflows.
>
> For the three current workflows I discuss their workings in some
> detail; I explain some of the wrinkles, anomalies and complications
> that dgit currently deals with, and that tag2upload takes care of.
> For the two future workflows - one near future, and one speculative -
> I sketch out what the support might look like.
>
> I also discuss my understanding of the alternative design proposed by
> some ftpmasters. In each case, the tag2upload design handles the
> situation well. In each case, the alternative design works
> significantly less well, or requires significantly more complexity in
> more places - usually both. In some cases the alternative design
> can't sensibly work at all.
>
> I want to emphasise that these are *examples*. I feel we have
> spent much of this thread (and much of previous conversations)
> playing whack-a-mole with "but you could fix that anomaly by doing
> X" and "you could handle that other sutation by doing Y". Where "X
> and "Y" are each not great, but perhaps might be tolerable, if they
> were the only limitation.
>
> So, yes, it is true, that in *some* of these cases, including
> perhaps many actual packages in practice, the alternativew design
> could be made to work. But the alternative design does *not* solve
> all the problems that tag2upload does, and the problems that it
> does solve it handles in a more complicated and ugly way, with more
> limitations.
>
> Taking this all together, the alternative proposal is sufficiently
> limited in scope, and poor in its outcomes, that it's not worth
> pursuing.
>
>
> Right then.
>
>
> 1. git-debrebase:
>
> Firstly, this is one of the easier cases from tag2upload's point of
> view. git-debrebase is modern and git-based, so has fewer warts.
>
> It's true that git-debrebase can make patches.
>
> But, the calls to git-debrebase that you make as a maintainer do not
> make any patches in debian/patches. Indeed, usually, if git-debrebase
> finds anything in debian/patches, it simply deletes it all.
>
> What happens is that dgit has special knowledge about git-debrebase:
> it knows that git-debrebase can make patches. (This is actually there
> as an optimisation: git-debrebase can make patches much faster.)
>
> When you do `dgit push-source` (which is how git-debrebase users
> upload), dgit knows it needs to maybe make patches, because that's how
> a "3.0 (quilt)" source package works. This is the "quilt-fixup" step
> of uploading, which is what (for historical reasons) the source
> package canonicalisation is called.
>
> So iff you are using git-debrebase with "3.0 (quilt)", dgit uses
> git-debrebase to make the patches and commit them to your branch.
>
> However, you can also run `dgit push-source --split-view=always`.
> This is an alternative workflow. In that case, the synthetic git
> commits which introduce d/patches don't end up in your own maintainer
> git branch. (I'm not sure Russ knows this feature exists.) This mode
> is nicer because you don't get diff noise about changes to the
> completely autogenerated contents of d/patches. Specifically, without
> the split view, each upload introduces a bunch of patches onto the
> maintainer branch, which the next run of git-debrebase after the
> upload immediately deletes.
>
> So in that case the maintainer branch never has patches and isn't
> treesame to a "3.0 (quilt)" source package.
>
> Also! You can use git-debrebase with 1.0-with-diff, or with 1.0
> native. (I'm not sure Russ knows this, either.) This is often a nice
> way of working, for a small package which usually has an empty or tiny
> patch queue. If you do that then there are no patches, ever, just git
> commits and an output tarball. And, there's a wrinkle: you can't use
> git-debrebase with "3.0 (native)" because of a bug in dpkg-source [1].
>
> So whether there are patches depends on the maintainer workflow, the
> intended source package format, and the surrounding context (eg
> sponsorship), and they are made by dgit, which calls out to
> git-debrebase as an optimisation.
>
>
> Relationship to tag2upload:
>
> git-debpush and the tag2upload tag don't know anything about any of
> this chaos. git-debpush simply signs a tag saying "this git branch is
> in a format suitable for quilt fixup in linear patches mode".
>
> git-debpush has *no* code to deal with any of the above. All of this
> is left to the tag2upload service.
>
> With a git-based sponsorthip workflow, the sponsor may not need to
> learn git-debrebase. They can review the git *tree*, diffing it
> against the upstream (ideally, upstream's signed tag), and likewise
> they can diff it against the previous upload. They'll declare the
> nicely predictable "linear" workflow mode in their tag. They can be
> sure that the output source package will be precisely the code they've
> reviewed git.
>
> (git-debpush does have one piece of git-debrebase-specific knowledge -
> an overrideable sanity check to guard against a user error causing an
> anonalous branch state. It's 9 lines of code - and nothing to do
> with source pacakge construction or package contents. This sanity
> check is not an essential part of git-debpush, and another tag
> generation utility, or a human, could omit it.)
>
>
> ftpmaster's alternative design, AIUI:
>
> (Here I'm going to compare tag2upload with the alternative design
> where the uploader signature covers a manifest of all the files in the
> unpacked source package - ie, of the result of dpkg-source -x. The
> ftpmasters haven't produced a complete design, but I think I can infer
> the properties that a full proposal would have.)
>
> In this alternative design, software making an upload intent tag for a
> git-debrebase package would need code to generate the contents of
> debian/patches. Realistically, that means it needs a copy of
> git-debrebase.
>
> And, the person authorising the upload now needs to to learn about and
> run and trust git-debrebase, which in our design they often didn't.
>
>
> 2. linear quilt mode, especially with NMUs
>
> I'm going to explain this in terms of git-based NMUs. Similar
> situations can arise in other situations, including certain (I think
> not widely used) maintainer workflows.
>
> When doing an NMU with git, you first obtain a suitable
> patches-applied git branch from somewhere. (Currently `dgit clone` is
> the best way to do that, but tag2upload will open up the
> possibility[2] of making it be just a `git clone` in the future.)
>
> You then make commit(s) representing your changes, and test them.
> (NB that testing them doesn't necessarily involve making a
> "3.0 (quilt)" source package. You can build binaries from git.)
>
> When you're happy, you file the NMUdiff bug report (you can use
> git-format-patch or git-diff for this), and you `dgit push-source`.
> Note that at no point have you done anything with d/patches.
>
> So at this stage, your git working tree has some applied patches in
> d/patches, plus also some changes that are only in git commits.
>
> dgit knows how to figure out *which* git commits need making into
> patches, which is a nontrivial problem. The basic algorithm is to
> calculate what the tree looks like if you take the orig tarball and
> apply the contents of debian/patches - that gives dgit the tree at the
> last upload. Then dgit walks backwards through the git history hoping
> to find a commit whose tree matches that last upload. Then it can
> walk forward again and make patches out of the commits.
>
> There's more. dgit wants to make patches that the NMU recipient won't
> object to. So, we can't just use gbp pq because some maintainers
> don't like its output and want the patches in closer to DEP-3 format.
> Therefore, dgit makes these patches by calling `dpkg-source --commit`
> with a stunt value of `EDITOR`.
>
> Again, all of this is only necessary with "3.0 (quilt)". It also
> depends on the archive contents - it's important to be using the orig
> tarball from the archive.
>
> Finally, did you know that dpkg-source and git can disagree about the
> meaning of patches? There are patches that dpkg-source can apply, but
> which git fails on. There are also patches that they *both* apply,
> but *disagree* about the meaning of! Real packages, including highly
> important core packges, are sometimes afflicted. dgit has code in it
> to deal with that too.
>
>
> Relationship to tag2upload:
>
> Once again, git-debpush and the tag2upload tag don't know anything
> about any of this chaos. git-debpush simply signs a tag saying "this
> git branch is in a format suitable for quilt fixup in linear patches
> mode".
>
>
> ftpmaster's alternative design, AIUI:
>
> In the alternative design it is probably not feasible to support NMUs
> of arbitrary "3.0 (quilt)" packages.
>
> Likewise maintainer workflows that rely on dgit's sophisticated git to
> quilt linearisation algorithm are also not supportable.
>
>
> 3. gbp
>
> git-buildpackage and gbp pq, and its patches-unapplied branch format,
> are probably the most common workflow in Debian right now.
>
> With gbp pq, the maintainer's DEP-14 tag (the tag2upload tag) is on
> that unapplied branch. With a "3.0 (quilt)" source package, it is not
> actualliy strictly necessary to apply the patches to make the source
> package, since the applied form of the files is not directly
> represeented. Instead, dpkg-source applies the patches on extraction.
>
> But there is a wrinkle. gbp inherits a bug in dpkg-source[4]: if the
> maintainer has edited the upstream .gitignore, in their git
> representation, this is *not* represented in the source package
> generated by git-buildpackage. IMO this is a clear DFSG violation[5].
>
> If the maintainer uses `dgit push-source --quilt=gbp`, dgit will spot
> this situation and make an additional patch in debian/patches,
> representing the maintainer's edits to .gitignore. That patch appears
> only in the canonical git branch and the source package, not in the
> maintainer's view of debian/patches.
>
>
> How does this relate to tag2upload?
>
> The tag2upload git tag does not contain any detailed information about
> any of this. It simply specifies that the quilt mode `gbp` should be
> used. The tag2upload server does all the work.
>
> (git-debpush *does* contain an overrideable sanity check that upstream
> files match and the patches apply. Again, this is not an essential
> part of its functionality and another signing tool wouldn't need it.)
>
>
> ftpmaster's alternative design, AIUI:
>
> The alternative design I've been positing supposes including a
> manifest of the contents of the unpacked source package. Ie, patches
> applied.
>
> In that alternative design, any utility which wanted to make an upload
> intent tag would need to be able to apply the patches. The patch
> application code becomes an essential part of the tag generation
> software.
>
> Also, the tag generation utility would need to have special knowledge
> about .gitignore. There are two options here: (1) have code to find
> the upstream .gitignores, compare them with the maintainer's
> .gitignores, and generate a synthetic patch. Or, (2) find the
> upstream .gitignores and arrange to include the hashes of the upstream
> .gitignores rather than the maintainer's .gitignores in the manifest
> (which IMO violates the DFSG [5]). In either case the tag generation
> utility needs special knowledge about gbp's .gitignore behaviour. Or
> of course we could: (3) don't let maintainers edit or add .gitignore
> in the upstream part of the package.
>
>
> 4. git-debcherry
>
> git-debcherry is an interesting git patch workflow utility. It is not
> currently supported by dgit, but that's not because it's impossible,
> or even particularly difficult. We just haven't got around to it. [6]
>
> I don't fully understand git-debcherry, but AIUI the basic principle
> is that it is a tool for constructing debian/patches based on a
> patches-applied maintainer branch. It has an interesting algorithm
> with some nice properties, including that it doesn't constrain the
> maintainer git branch structure.
>
> Only git-debcherry knows what patches it's going to produce, and
> it takes the orig tarball as an input.
>
> Support in dgit would be to have dgit call git-debcherry at an
> appropriate point in the source package construction (during what dgit
> calls "quilt fixup").
>
>
> Relationship to tag2upload:
>
> tag2upload doesn't support this yet, but it could do. We would add
> the support in dgit, and when that was deployed to the tag2upload
> server, git-debcherry would be useable with tag2upload right away.
>
> As with the other workflows, git-debpush wouldn't need any code
> specific to git-debcherrry. Like the other patches-applied workflows,
> the authorising uploader (eg, a sponsor) does not need to understand,
> or run, git-debcherry.
>
>
> ftpmaster's alternative design, AIUI:
>
> git-debcherry uses the orig tarball, so it couldn't be supported,
> since the uploading developer doesn't have any tarballs.
>
> It might be supportable if we also made changes to git-debcherry, to
> allow it to work off an upstream git tag instead.
>
>
> 5. language team monorepos
>
> Several teams handling upstream language-specific package managers
> have a monorepo on salsa containing metadata and patches. I'm aware
> of at least Rust and Haskell working this way. The precise contents
> of the monorepo vary, and each team has team-specific tooling.
>
> The fragmentation is a problem, and the workflows can be very awkward.
> Typically .dscs are constructed on maintainer laptops using
> team-specific tooling, taking both the team monorepo and upstream
> artifacts as inputs.
>
> None of these are supported by `dgit push-source` right now. It would
> be nice to be able to improve this, by formalising and streamlining
> the conversion process including source package construction. I think
> that would be possible in principle, but the design space is large and
> as far as I'm aware there hasn't been any serious conversations,
> involving both source handling experts (like the dgit team) and
> multiple monorepo packaging teams, about common aspects of their
> workflows, differing requirements, etc.
>
> (I should say that at least for Rust, which I know very well, I have
> serious doubts as to whether the monorepo is the right approach, but
> that's a whole other can of worms.)
>
>
> Relationship to tag2upload:
>
> If we deploy tag2upload, we'll be greatly streamlining the usual
> uplaod case. This will increase the gap between the existing monorepo
> workflows on the one hand, and the majority of packages (which are
> supported by tag2upload) on the other hand.
>
> The potential gains from improving the monorepo workflows will be
> bigger, and also more evident to a wider set of people.
>
> In summary, supporting monorepo team(s) with more-git-based workflows
> is probably possible, in the medium to long term. I think it's likely
> to happen with tag2upload.
>
>
> ftpmaster's alternative design, AIUI:
>
> I think the alternative design couldn't ever handle multi-package
> monorepos in the style of the Rust or Haskell teams.
>
>
>
> Ian.
>
>
> Footnotes.
>
>
> [1] dpkg-source hates "3.0 (native)" with non-native version,
> despite TC request to please allow it:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=737634#107
>
> [2] To support NMUs based on just "git clone" we'd need to start
> importing every non-git-based[3] .dsc into git, which isn't a sensible
> thing to do until the git repository and everything is scaled up due
> to git-based .dscs being more common, which will be an effect of
> tag2upload.
>
> [3] By "git-based" I mean that the .dsc tells you which git commit it
> was made from, and the git tags etc. tell you how. I don't mean to
> include ad-hoc source package construction from untraceable git trees
> using untrackedd software on maintainer laptops.
>
> [4] The dpkg-source bug about the .gitignore DFSG violation:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908747
>
> [5] Reading the bug report[4] it's clear that not everyone agrees that
> discarding our .gitignore changes is a DFSG violation. I find that
> position quite implausible but I'm hoping we don't need to resolve it
> here.
>
> [6] dgit feature request ticket "want dgit --quilt=debcherry"
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930881
--
Sean Whitton
Reply to: