Archive database (projectb) queries for the public

To: debian-dak@lists.debian.org, Debian Keyring Maintainers <keyring-maint@debian.org>
Subject: Archive database (projectb) queries for the public
From: Ian Jackson <ijackson@chiark.greenend.org.uk>
Date: Sat, 16 Nov 2013 11:36:20 +0000
Message-id: <[🔎] 21127.22708.142453.920100@chiark.greenend.org.uk>

Part I:  Making it easier to get data out of the dak db
-------------------------------------------------------

I just spoke to Mark Hymers about some queries that dgit needs to make
to the projectb.  At the moment, dgit does "ssh coccia psql blah"
which is obviously rather fragile.  It also doesn't work for non-DDs.

Mark proposed that the right answer would be to have an
internet-facing service on coccia (with appropriate service name)
which would make projectb queries.  Each particular query would have a
name in this service, and the caller would say "please make query foo
with args bar, baz, ...".

When someone wants some new kind of query, they'd send the query
string to the dak maintainers who would (review it for sanity and)
give it a name and install it.  The dak maintainers would take care of
updating it in the face of schema changes.

The plan would be to have the apache on coccia configured to proxy
these requests (probably they'd be JSON over HTTP, but the exact
syntax doesn't matter very much) to the new dak query service.  It
would run on coccia so that it has direct read-only access to the
database, without the sky falling if it turns out to get compromised
(or breaks the machine).

This all sounds good to me.

With this in mind, here are the two queries that dgit currently
makes.  I'm very open to being told these are wrong - I constructed
them by peering into the data in the projectb and seeing what seemed
to produce right answers.

1. Find out the versions and filenames of source packages in the
   archive:

        SELECT source.version, component.name, files.filename
          FROM source
          JOIN src_associations ON source.id = src_associations.source
          JOIN suite ON suite.id = src_associations.suite
          JOIN dsc_files ON dsc_files.source = source.id
          JOIN files_archive_map ON files_archive_map.file_id = dsc_files.file
          JOIN component ON component.id = files_archive_map.component_id
          JOIN files ON files.id = dsc_files.file
         WHERE ( suite.suite_name='$isuite' OR suite.codename='$isuite' )
           AND source.source='$package'
           AND files.filename LIKE '%.dsc';

   This information is used by dgit to construct a URL from which to
   download the .dsc, and hence the source package.

   I need all the versions, not just the most recent, so that "dgit
   clone" works even if there has just been an upload which is in the
   db but not yet available in the user's view of the ftp archive.

   (The input value $isuite is the suite specified by the user, either
   on the command line or in debian/changelog.  There isn't a sql
   injection attack here because these strings are syntax-checked;
   this was easier than trying to use bind variables from psql.)

2. Find the canonical name of a suite:

        SELECT suite.codename
          FROM suite where suite_name='$isuite' or codename='$isuite';

   dgit needs this so that if you say "dgit clone blah testing" you
   end up with a git branch called "dgit/jessie" (which will remain
   correct when jessie becomes stable) and not "dgit/testing".

   Mark said there was a wrinkle here where some suites have this the
   other way around to others.  In fact it looks a bit more
   complicated than that, looking at the output of "select
   suite_name,codename from suite order by codename;".  Anyway at the
   moment it looks like all of
       dgit clone blah testing
       dgit clone blah jessie
       dgit clone blah testing-proposed-updates
       dgit clone blah jessie-proposed-updates
   will DTRT.  "dgit clone testing-updates" will fail, which is a bit
   annoying but not critical given that you can't "dgit push" to
   testing-updates.

I would like to see this sorted out as soon as possible, both to get
the query string out of dgit (in case of schema changes) and to make
it easier to make dgit useable by non-DDs.



Part II:
--------

We should provide integrity (important) and confidentiality (less so)
for these publicly-available projectb queries.

The obvious answer is to do the queries over https.

We should use a dedicated CA to sign the server's TLS key.  The CA's
public key would be included in some Debian package, ideally
debian-keyring.  Using a dedicated CA avoids relying on the public
X.509 infrastructure which is both inconvenient and insecure.

Key rollover will happen as follows: generate a new server TLS private
key and a new CA.  Publish the new CA public key along with the old
one in debian-keyring.  Wait for the new debian-keyring package to be
used "everywhere", and then change which CA certificate is offered by
the webserver.  This odd scheme is needed because although a TLS
server can't offer multiple certificates, a TLS client can be
configured to trust multiple root CAs.

We should use a cipher suite which provides perfect forward secrecy,
because there is no reason not to.

CCing keyring-maint about this so they can comment.


Thanks,
Ian.

Reply to:

Follow-Ups:
- Re: Archive database (projectb) queries for the public
  - From: Joerg Jaspert <joerg@debian.org>
- Re: Archive database (projectb) queries for the public
  - From: Jonathan McDowell <noodles@earth.li>

Next by Date: Re: Archive database (projectb) queries for the public
Next by thread: Re: Archive database (projectb) queries for the public
Index(es):
- Date
- Thread