Re: [RFC] General Resolution to deploy tag2upload
So, first, I owe you and the FTP team an apology. I was totally convinced
that there had been more recent public discussions of tag2upload involving
the FTP team than 2019. I either got confused with other discussions or
had the increasingly common problem of thinking that things that happened
ten years ago had only happened two years ago. Regardless, I really
should have checked, I didn't, I made an incorrect assumption, and I
apologize.
I would not, as a general rule, assume that any delegate decision made
five years ago still holds today, and if I had not made erroneous
assumptions about the timeline, I would have phrased several of my
messages here differently. This is entirely my fault.
So far, from this thread, it looks like the decision from 2019 may still
stand, but I think there are still places to explore.
Joerg Jaspert <joerg@debian.org> writes:
> On 17261 March 1977, Russ Allbery wrote:
>> Why is this your red line? Is it only that you don't want to add
>> another system to the trusted set, or is there something more specific
>> that you're concerned about?
> There ought to be one point that is doing this step, not many, yes.
> Includes that it is the delegated work and task description of FTPMaster
> to do this, though that can be addressed by either us ending up running
> it, or adjusting delegations. Not sure the latter ends up with happy
> people, but is one existing way.
Elsewhere in this thread, Jessica Clarke made the excellent suggestion
that perhaps the authentication check concern could be resolved by dak
providing an API for performing the authentication and authorization
check. I am embarrassed that I didn't think of that; thank you very much
to Jessica for that suggestion.
That gives me some hope that this point has a relatively neat solution, so
I'm going to focus on exactly what dak needs the uploader signature to
cover in order to accept the package.
> Also, currently we have the nicety that we store all signatures directly
> besides the source package, available for everyone to go and check.
> Linking back to the actual Uploader, not to a random service key. You
> can take that, run a gpgv on it and via the checksums of the files then
> see that, sure, this is the code that the maintainer took and uploaded.
> You do *not* need to trust any other random key on that. Not that of
> tag2upload. *AND* not that of FTPMaster.
The dgit-repos server similarly archives the signed Git tag with the Git
tree over which it is a signature, ensuring that this is independent of
Salsa where the tag could potentially be deleted by someone. This is not
in the archive, of course, but I don't see any technical reason why some
version of that data couldn't also be uploaded to the archive if one
wanted to use the archive as a highly distributed backup of the dgit-repos
server. There is, however, the long-standing concern about any variation
on the 3.0 (git) source package format that the Git tree the maintainer
signed may contain non-free code somewhere in its history.
So here too, I'm not sure that this is inherently a blocker, although in
the past the FTP team has been reluctant to include in the archive the
data that is required to preseve a complete record of what is signed by a
Git tag. (One obvious potential solution is to only put a shallow clone
in the archive, so you can verify the signature but some of the
content-addressable store references are unresolved.)
> Unsure those are the right words. We want to have the uploader create a
> signature over the content they want to have appear in the archive. In a
> way, that this signature can be taken and placed beside the source, and
> then independently verified. *Currently* this is done using .dsc files.
Okay, so again I think it's easier to talk about specifics, so let me make
this concrete by using myself as the use case.
I use the git-debrebase workflow for maintaining most of my Debian
packages. What this means, for those who aren't familiar with it, is that
my workflow looks like this (this is idealized; I'm still migrating my
packages fully to this workflow so the specifics currently vary somewhat):
1. I start a new package by creating a new Git branch based on the Git tag
of the latest upstream release. I then add the debian/* directory with
packaging files and commit that directly to the resulting branch.
2. I work on the package, freely making commits to both the debian/* files
and the upstream source to fix problems and adjust the software for
Debian. The only constraint that I have to follow is that I can't make
a commit that changes both files in debian/* and files outside of
debian/* at the same time. Other than that, I can treat this branch
like a completely normal Git branch and do development like I would in
any other Git repository, without doing anything special for the Debian
packaging.
3. When upstream releases a new release, I can *rebase* my changes on top
of the new upstream release rathe than doing a merge with all the
messiness that a merge involves. For me, this is huge. I can fully
drop upstream changes that have been merged upstream, rework changes
that need to be done differently based on upstream changes, and don't
have to wrestle with a long and messy merge history with conflict
resolutions that grows over time. Instead, I can always see a simple
list of the changes that I've applied to the current upstream release.
This is exactly the workflow that I use with other development forks in
Git with non-Debian packages. (I do have to remember to run git
debrebase conclude here to make the magic work.)
4. When I'm ready to upload, currently I run dgit locally. dgit looks at
my Git repository, finds all of the commits that modify the upstream
source, extracts the commit metadata, creates nice patches based on
those commits with proper metadata taken from the Git commit metadata,
and uses them to construct a normal 3.0 (quilt) source package. Anyone
working with the source package can treat it exactly like any other 3.0
(quilt) source package and has no need to care that I use the
git-debrebase workflow.
Making all of this work involves some Git trickery that I know some people
dislike (for example, all of this is serialized as a sequence of Git
changes that are fast-forwardable *including the rebases*, which is dark
magic), but for me this is an excellent workflow. The development
experience matches my mental semantics of the relationship between the
Debian packaging and the upstream code, and the Git trickery is all hidden
from me behind a nice interface.
Now, I would like to use tag2upload rather than using dgit locally to make
the upload. I want to move my testing into Salsa CI so that my overall
workflow more closely matches the way that I do all of my development in
my day job. Salsa CI is great about not getting lazy and skipping test
steps just because I am in a hurry to get a package uploaded, and I can
capture every test that was useful and not have to remember to re-run it.
(This is the part that I haven't done yet; I know I want to do it and have
not yet found the time.)
What signed artifact do I need to provide so that the FTP team will be
comfortable accepting my tag2upload-built source package?
Note, importantly, that the source package contains things that are not in
the files present in the working tree of a local Git checkout of my source
package. The patch descriptions and committer information and related
metadata are where they are supposed to be in Git: in the metadata for the
corresponding Git commit, not in a file in my working tree. The
transformation that puts that data into a 3.0 (quilt) source package is
not rocket science, but it's not trivial either.
The signed artifact that I'm naturally providing is a signature across the
entire Git tree, which includes all of the history and thus all of the
data that goes into the source package. So everything that goes into the
source package *is signed*, by me, when I trigger a tag2upload upload.
The problem comes when dak wants to verify the correspondence between that
data structure and the source package. It certainly can verify that my
Git tag is valid and it can verify that the tag specifies the correct
source package, version, and so forth. But if it wants to verify that the
construction of the debian/patches/* directory is correct, I think it
would have to perform the same transformation on my Git history that dgit
and tag2upload perform.
> I basically assume that the uploader *does* need to have their source
> locally, no matter what. (Their git cloned).
Yes, I agree. I don't think there's any way to avoid this: the source has
to be in the same place that the key is in, or close to in the case of
secure key storage, in order for the uploader to sign it and know what
they are signing.
> I also do assume that the uploader will build things, to see if the
> stuff they are going to "push to the archive" (and our users) actually
> does what they intended it to do - and to test it.
This is the assumption that I think is no longer valid given Salsa CI. It
used to be that this was the only way to test a package; now we can do
equally well and often better by letting Salsa CI do the hard work.
> Well, if the maintainers system is broken in, it makes no difference if
> a git tag or a dsc or whatever else is signed.
This is more true than I would like it to be, and in the case of a Debian
maintainer who doesn't have any sort of hardware key storage and does all
their Debian development on the same system that they read mail, browse
the web, opens random downloaded PDFs, try random software, etc., I think
this is true and it's one of the things that I worry about with our
existing security model.
However, I don't think this is *necessarily* true for all maintainers, and
tag2upload creates the *possibility* of doing better. Whether we will
take advantage of that possibility, I don't know. But creating a
tag2upload tag requires GnuPG and Git and not much else, and other people
can see exactly the Git contents that were signed.
Better security models are possible even with *.dsc files, of course, but
I think tag2upload opens the door for a few additional improvements such
as moving the source package construction off the maintainer's system,
and, more importantly, forces exactly the content that was signed to be
uploaded to Salsa, which provides that data in a somewhat richer form that
gives us some additional detection and tracing capabilities.
--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>
Reply to: