Re: Security review of tag2upload

To: Antoine Beaupré <anarcat@debian.org>
Cc: debian-vote@lists.debian.org
Subject: Re: Security review of tag2upload
From: Russ Allbery <rra@debian.org>
Date: Wed, 12 Jun 2024 11:22:39 -0700
Message-id: <[🔎] 878qzaggwg.fsf@hope.eyrie.org>
In-reply-to: <[🔎] 87y17a2iwn.fsf@angela.anarc.at> ("Antoine Beaupré"'s message of "Wed, 12 Jun 2024 13:03:04 -0400")
References: <[🔎] 87wmmv7xdz.fsf@hope.eyrie.org> <[🔎] 87y17a2iwn.fsf@angela.anarc.at>
Antoine Beaupré <anarcat@debian.org> writes:
> On 2024-06-11 18:39:04, Russ Allbery wrote:

>> Below is the security review that I did of the tag2upload design.

> Hi Russ, and thank you so much for taking the time to do this excellent
> work. It's really comforting to think that we have an actual
> professional look at our stuff, and I think we should do this more often
> and systematically. :)

Thank you, that's very kind!

>> I will also post this review on my web site, probably later tonight if
>> I have time.

> I didn't find that post, btw... No big deal of course!

Ah, yes, indeed, I once again overestimated the amount of time I'd have
available in the evening and used it all responding to the thread instead.
Maybe tonight?

[... lots of trimming throughout ...]

>> Neither the existing upload mechanism nor tag2upload attempt to prevent
>> or detect (as opposed to trace) the upload of a malicious source
>> package by someone in full possession of a key in the keyring, so this
>> threat is not considered in this document, although tracing for this
>> threat is discussed briefly.

> I'm actually curious as to why that is treated as a separate
> possibility, because if kind of overlaps with the second model ("someone
> uploads a malicious package appearing from someone else")...

I think it's important to note that, in both the old and new system, if
someone compromises a Debian uploader's key, they can upload malicious
packages signed with that key and we have no systematic prevention or
detection for that.  It's a bound on the security problem that we're
trying to solve.  We're still trusting the OpenPGP keys of people with
upload rights.

The second model is the impersonation case: while someone with a
compromised key is going to be able to upload packages signed with that
key, they shouldn't be able to hide by making the packages appear to be
signed with a different key.  This is important for tracing the damage
once we realize the key has been compromised, and also for ensuring
authorization checks are done correctly in the Debian Maintainer case.

> For me, that case and the "xz-utils" case are actually quite pressing
> matters, they don't quite keep me up at night, but they're the kind of
> threat models I do worry about and that we should address head on. But
> *maybe* this is not the right vector to address them, that said...

I think tag2upload is not quite the right vector to address that specific
problem, although it provides some assistance.  In that case, upstream
injected malicious code into the upstream release.  To the extent that the
Debian uploader trusts that code and does not detect that it's malicious,
we're going to continue to be vulnerable to that attack.  It is extremely
difficult to prevent entirely.

tag2upload offers some benefit here in that it forces all of the code into
Git.  That does not make it any less malicious, obviously, but it *might*
make it easier to detect, because diffs in Git are somewhat easier to
analyze than tarballs.

That depends entirely on how much the maintainer audits the upstream code,
though.  If the maintainer blindly imports upstream code into the Git
repository and then builds and uploads a source package (the current
system) or signs a Git tag of that code (the tag2upload system), we're
still going to get malicious code in the archive.  And I have certainly
been that maintainer at some points in the past.  There's a lot of work to
do in Debian and it is extremely hard to keep up while still reviewing and
understanding every line of upstream's changes.

>> The sandboxing design of the tag2upload server does a good job of
>> reducing that risk. Signatures are checked early, so only attackers
>> able to create a valid OpenPGP signature with a key in the keyring can
>> attack the most security-sensitive part of the system. The signing key
>> is isolated from both the component that processes incoming requests
>> from Salsa and the component that constructs the source package, only
>> interacting with them via a restricted protocol.

> This is more a question to the dgit people, but what kind of hardening
> do we have on the tag2upload server? I think dak has the cryptographic
> keys in a HSM (Hardware Security Module) to prevent a threat actor from
> grabbing those keys for offline attacks...

> Is there something similar (HSM or YubiKey) on the tag2upload server? If
> not, why?

I believe that using an external hardware key for the tag2upload signing
key is indeed the plan.

>> We don't have reproducible source package builds today, so this is not
>> a regression. We currently blindly trust whatever the uploader uploads,
>> and the tag2upload proposal does not make that risk worse, merely
>> shifts it to central infrastructure. I therefore don't consider
>> reproducible source builds to be a security prerequisite for adoption
>> of the tag2upload proposal. It is, however, obvious follow-on work that
>> would improve detection of some classes of attacks.

> Does tag2upload make reproducible source packages harder?

I personally have not done any work on reproducible source packages and
therefore do not know the exact shape of the problem or the unexpected
pitfalls, but just intuitively I believe tag2upload would make
reproducible source packages easier for those packages that are uploaded
via tag2upload.  tag2upload builds the source package in a well-known,
controlled, reproducible environment with known tool versions and a
specific implementation of the process for turning a Git repository into a
source package.  One should therefore be able to get a long way towards
reproducing those source packages by starting with a second tag2upload
server with the same code but running in a separate security domain.

My understanding is that variations in the output of git archive, tar, and
compression programs mean that reproducible source packages are probably
best approached by trying to make the contents of the tarball reproducible
rather than the compressed tarball itself.  This does miss some amount of
attack surface, though.  This seems to be a general problem that's mostly
orthogonal to tag2upload.  If anything, I'd guess the tag2upload process
would make extending reproducibility to the compressed tarball itself
somewhat easier since it can record precise tool versions.  But that may
not be enough if the same version of the tool can produce different output
given the same Git tree.

>> I suspect (but am not certain) that this attack would normally be
>> prevented by the Salsa Git service. The benign tree already existed in
>> the same repository with the referenced commit ID (presumed to be
>> checked by the sponsor during review), and even if references to that
>> object are deleted via branch deletion, I believe Git will reject the
>> push of the malicious commit ID until the old objects have been
>> garbage-collected.  This presumably will take long enough that the
>> tag2upload process will fail because the upstream commit is missing.

> Yeah, that's a reasonable assumption, but I believe those jobs are ran
> on a schedule on GitLab servers, so an attacker could *time* their
> attack just so, to make sure the old tree gets GC'd just in time. It's a
> heck of a race to win though, especially, since you need to time it on
> the other side as well...

I was hoping that the Salsa scheduled cleanup tasks would use git gc with
close-to-default options, which among other things means a prune threshold
of two weeks.  I am not an expert in the Git storage format, but I believe
that means unreferenced objects would still be retained for two weeks and
thus would continue to block a push of new objects with the same SHA-1
hash for that period of time, which should be more than enough time to
prevent this race.

> I wonder if you've considered the "we need to revoke access to
> compromised/hostile developer" threat model. Right now, we have a
> relatively centralized model here (modulo DAM, dak, debian-keyring), and
> we're introducing a new component... How does tag2upload manage keys and
> does it introduce additional response time or issues when revoking
> access to retiring or revoked developers?

The intent of the design is that the authorization checks done in
tag2upload are for its own security boundary and are redundant with checks
performed by dak, with the exception of verifying the Git tag signature
and recording the fingerprint of the signed key.  (That necessarily has to
be done by the tag2upload server.)

Therefore, in the case you describe, even if tag2upload was slow in
receiving an updated keyring and thus accepted a package from someone
whose access was revoked, the only thing that person could do is induce
tag2upload to construct the source package and pass it to dak.  dak would
then retrieve the key fingerprint from the source package metadata,
perform its normal authorization checks, see that the uploader's access
has been revoked, and reject the package.

Rejected packages are still recorded in the dgit-repos archive server by
design (this is true of how it's being used today with dgit), so after a
package reject the source version will have to be changed before it can be
uploaded again.

> Having uploads in Git brings a whole set of interesting properties and
> tools we could leverage there as well, to ensure the integrity of the
> dgit repository itself. When Tor transitioned from gitolite to GitLab,
> one of the concerns was exactly that kind of problem space where we're
> not sure we want to trust GitLab with our code. So I did a significant
> amount of work researching Git integrity solutions, and my findings are
> documented here:

> https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/gitlab#git-repository-integrity-solutions

> The work the kernel.org folks have been doing about publishing a
> transparency log for the kernel git repository might be particularly
> relevant here.

This is a great analysis.  Thank you!

git-evtag seems like the system that's the most relevant to the specific
tag2upload problem, although it has the substantial drawback of requiring
a non-standard tool and tag format.

-- 
Russ Allbery (rra@debian.org)              <https://www.eyrie.org/~eagle/>
Reply to:
References:
- Security review of tag2upload
  - From: Russ Allbery <rra@debian.org>
- Re: Security review of tag2upload
  - From: Antoine Beaupré <anarcat@debian.org>
Prev by Date: Re: [RFC] General Resolution to deploy tag2upload
Next by Date: Re: [RFC] General Resolution to deploy tag2upload
Previous by thread: Re: Security review of tag2upload
Next by thread: Re: Security review of tag2upload
Index(es):
- Date
- Thread