[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Upstream dist tarball transparency (was Re: Validating tarballs against git repositories)



Hi!

On Fri, 2024-03-29 at 23:29:01 -0700, Russ Allbery wrote:
> On 2024-03-29 22:41, Guillem Jover wrote:
> > (For dpkg at least I'm pondering whether to play with switching to
> > doing something equivalent to «git archive» though, but see above, or
> > maybe generate two tarballs, a plain «git archive» and a portable one.)
> 
> Yeah, with my upstream hat on, I'm considering something similar, but I
> still believe I have users who want to compile from source on systems
> without current autotools, so I still need separate release tarballs.
> Having to generate multiple release artifacts (and document them, and
> explain to people which ones they want, etc.) is certainly doable, but I
> can't say that I'm all that thrilled about it.
> 
> I think with my upstream hat on I'd rather ship a clear manifest (checked
> into Git) that tells distributions which files in the distribution tarball
> are build artifacts, and guarantee that if you delete all of those files,
> the remaining tree should be byte-for-byte identical with the
> corresponding signed Git tag.  (In other words, Guillem's suggestion.)
> Then I can continue to ship only one release artifact.

I've been pondering about this and I think I might have come up with a
protocol that to me (!) seems safe, even against a malicious upstream. And
does not require two tarballs which as you say seems cumbersome, and makes
it harder to explain to users. But I'd like to run this through the list
in case I've missed something obvious.

I've implemented a prototype for dpkg, in the branch:

  https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/log/?h=next/dist-transparency

For context, for a long time dpkg dist tarballs have already shipped a
«.dist-version», I think some GNU projects started to do something
similar but with a different name.

The relevant commits:

  https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=54a6ad9db3da335a40fed9020195864c4a87bdc1
    (Add .dist-vcs-id, in git main already)

    At least for dpkg, if «make dist» is run from outside a tag, then
    the version will include the commit and whether the working dir
    was dirty, but from a tag, only the version is included and there's
    no link to what commit that was pointing to at that time. This file
    adds that link, regardless of the current commit. And prints it as
    part of the configure summary.

  https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=1944a90d13c7c63592c438e550a212ab9e3aad76
    (Remove VCS specific files from dist)

    Simplifies the comparisons.

  https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=39d181e60b3413c58a72056beec0a5a6f584cd92
    (Add .dist-vcs-url)

    This adds a new file to track the upstream VCS URL, so that it can
    used from a deterministic place, for verification purposes.

  https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=b3d7e0f195bd69b4622121e78fce751ea76dc0bc
    (Add .dist-vcs-files)

    This adds a new file with the list of files *in* the VCS, so that
    we can get back to that clean state, even from a distributed
    tarball, or from a extracted directory with built artifacts.

    I also thought about listing the autogenerated files as Russ
    mentions, but that seems error-prone and non-exhaustive, because
    those might depend on the version of the autotools (or other build
    system) used, and does not include artifacts part of the build phase,
    which could be used to smuggle things in.

    This last commit lists the three operations that all this makes
    possible:

      * list difference in file lists (should be none)
      * list difference in file contents (should be none)
      * resetting the directory into a state like the VCS (except
        for the VCS tracking/supporting files)

    These operations are fairly generic, the one thing I could see
    being "configurable" is the VCS files to exclude, maybe via
    another file, but I've not thought about the consequences here.


I think this is safe (in the sense of detecting smuggled artifacts or
modifications in the dist tarball not present in the VCS, but certainly
not against modifications or artifacts smuggled in the VCS), because a
user that wants to verify any of this can make sure the URL is the
expected one, and everything else seems to fall from here, otherwise
you should get differences. (Thinking now, perhaps one of the checks
should be whether the expected tag or branch matches the commit id?)

This is currently catered for a Debian native package or just handling
the upstream part with no packaging, but I don't think it would be much
work to integrate this into packaged upstreams (mostly excluding whatever
is in the debian.tar parts?), or even to use something like this from
an upstream that does not provide these files by adding equivalent files
or metadata in the packaging.

The only things that one would need to trust are the invocations to
perform those actions, which should *not* be part of the distributed
tarball. I'm thinking to perhaps create a new git repo containing
those snippets so that users can use them or as reference
implementations. Or perhaps include some of this within dpkg (but then
those could not be used to verify dpkg itself :D).

For dpkg, I'm considering merging this, and then performing the resetting
during the package build. Eventually perhaps this could be added as
a feature in dpkg-source?

Thanks,
Guillem


Reply to: