[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

tag2upload (git-debpush) service architecture - draft

Hi all.

I wrote this draft design doc / deployment plan for the tag-to-upload
service, perhaps best summarised by Sean like this:

  We designed and implemented a system to make it possible for DDs to
  upload new versions of packages by simply pushing a specially
  formatted git tag to salsa.

  Please see this blog post to learn about how it works:

The server side of this is not running yet and there is some work to
do for that.

We've had a number of peripheral conversations, and informal
internal reviews, but I think it's the stage now to have a public
design review etc.  I'm CCing this to -devel because I just did a
lightning talk demo of the prototype and IME many people are
interested in these kinds of questions.

Right now this document is maintained here:
but NB that that is a potentially rewinding branch.  (I probably won't
rewind it until it's time to fold it into master at which point I may
just delete it.)



Overall structure and dataflow

 * Uploader (DD or DM) makes signed git tag (containing metadata
   forming instructions to tag2upload service)

 * Uploader pushes said tag to salsa. [1]

 * salsa sends webhook to tag2upload service.

 * tag2upload service
    : provides an HTTPS service accessible to salsa's IP addrs
    : fishes url and tag name out of webhook json
    ! checks that url is basically sane
    - retrieves tag data (git shallow clone)
    ! parses the tag metadata
    ! checks to see if it is relevant
    ! verifies signature
    ! checks to see if signed by DD, or DM for appropriate package
    - obtains relevant git history
    - obtains, if applicable, orig tarball from archive
    - makes source package
    # signs source package and "dgit view" git tag
    - pushes history and both tags to dgit git server
    - uploads source package to archive

 * archive publishes package as normal

[1] In principle other git servers would be possible but it would have
to be restricted to ones where we can either avoid, or stop, them
being used as a channel for a DoS attack against the tag2upload

Service architecture

I propose the following architecture for the tag2upload service.

 * Packet filter limiting the incoming connections to salsa.

 * Conventional webserver offering TLS and using Let's Encrypt.
   (Alternatively, HTTP could be used, but in the future we
   might want to handle embargoed security uploads so let's not.)

 * Web-service-style "application server" written in some scripting
   language listens on a local TCP port, handles HTTP connections
   proxied by the webserver, parses the JSON, and connects to:

 * Trusted service daemon.  Listens on a TCP connection and accepts a
   simple line-based "url tag" protocol.  Checks urls and tags for
   basic syntax and sanity (eg that it has the right protocol and
   host).  Keeps track of incoming requests in a sqlite3 database so
   that execution can be deferred and retried as applicable.  Spawns
   per-request worker children.

 * Request processor.  Trusted.  Does the trusted parts above.

 * Some VM or container or maybe chroot.  Instantiated by request
   processor via adt-virt protocol.  Request processor controls this
   by sending it commands (via the adt-virt facility for this).

 * In the VM, git is used to fetch all the bits and dgit does the
   actual source package generation work.

 * Trusted service daemon needs access to its GPG key which should be
   on a hardware token and not accessible to the VM instances.


The tag2upload service will have to have a signing key that can upload
source packages to the archive.

We do not want that signing key to be abused.  In particular, even
though it will be in a hardware token we want to avoid giving
unrestricted access to that key to code which also has a large attack
surface.  In particular, source package construction is very complex.

So there will be a privilege separation arrangement, as described
above.  Different tasks run in a different security context:

    ! is fully trusted and has access to the signing key

    - runs in the discardable VM or container, controlled by `!'

    # is achieved by the `dgit rpush' protocol, where the trusted
      (invoking, signing) part offers a restricted signing oracle to
      the less-trusted (building) part.  The signing oracle will check
      that the files to be signed are roughly in the right form and
      that they name the right source package.  It will construct the
      "dgit view" git tag itself from metadata provided by the
      building part.

    : can run as different unix users or even different VMs or
      something, if desirable

Reproducibility, metdata and auditing

The trusted part of the tag2upload service will keep some logs,
particularly of each tag it is told about and what the disposition of
that was, and when it was retried.

Also, it will send the following information to a public mailing list:
  - The tag object data for any tag it decides to process,
     before it passes it to the VM.
  - A report (more or less, a shell transcript)
     of each processing attempt
  - The list will also be the public email address of the
     tag2upload robot's signing key

The generated .dscs will contain additional fields

  Git-Tag-Tagger: Firstname Surname <email@address>

      "tagger" line from the git tag converted to deb822 format

  Git-Tag-Info: tag=<tagobjid> fp=<fingerprint> algos=1,8

      <tagobjid> is the git object ID of the tag object
          (if someone wants to find this, it can be found on the
           dgit git server)

      <fingerprint> is the "fingerprint_in_hex" from the VALIDSIG line
      in the gpgv output.  algos is the <pubkey-algo> and <hash-algo>
      (here, 1,8 as examples).

This additional metadata is necessary to be able to tell by looking at
the .dsc who the original uploader was (which might be different to
the maintainer, in the sponsorship case).  (Programs which use the
uploader signature identity will send mails to the mailing list
mentioned above, until they have been updated.  This is not desirable
but not a blocker for deployment.)

The generated .changes will contain copies of the two .dsc fields

The upload will contain a .source_buildinfo.  This will list the
versions of the software running in the VM, which is primarily what
controls the generated .dsc.

It will also list the versions of dgit-infrastructure and git running
in the trusted part, because the trusted part assembles the tag lines
etc. and interprets the git tag.

Eventually hopefully there will be a mode for sbuild (related to
binary build reproduction), or a suitable script, which can verify a
reproduction attempt.  For now the src:dgit test suite will check that
the upload is reproducible if run again in the same environment.


This service is not very resistant to DoS attacks.  In particular,
sending it bad URLs might stall it (since it has to retry failing

So we (i) do not expose it to anyone but salsa and (ii) limit it to
trying to fetch salsa urls.

Making very many tags on salsa would stress this tag2upload service a
bit but not fatally, and it would be a DoS against salsa too.

After signature verification, we are much more vulnerable to DoS.  An
approved signer can get the service to do a lot of work.  That is the
purpose of the service, indeed.

Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Reply to: