[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Packaging audit trail mechanism (was: Ubuntu and its "appropriation" of Debian maintainers)



On 5/2/05, Matt Zimmerman <mdz@debian.org> wrote:
> Another option would be to leave the source package maintainer the same (to
> retain proper credit, etc.), but override the binary package maintainer
> during the build (to reflect that it is a different build, and also display
> a more appropriate name in "apt-cache show" etc.).
> 
> What do you think about this approach?

Personally, when I rebuild a package that might get handed to someone
else -- even if I didn't touch the source, but am rebuilding in a
known environment so I can reproduce it later -- I change the
Maintainer field to an e-mail address that reaches me, and add a
debian/changelog entry with an explanation of why it was rebuilt and
an appropriate suffix on the version number.  Otherwise, I'm risking:

1) Implying that the Debian maintainer is part of my organization,
since it appears that he/she was the last person to touch the package;

2) Suggesting that bug reports should be sent directly to the Debian
maintainer and/or BTS, possibly annoying him/her and probably leaving
me and my organization out of an interaction that we ought to know
about;

3) Violating some licenses (the GPL, for instance), at least in
spirit, by making it hard to determine who is responsible for meeting
obligations to provide source code (and, again at least in spirit,
detailed instructions about reproducing the build environment).

When I am distributing unaltered Debian source packages alone, or
bit-exact copies of Debian binary packages, I don't worry as much
about these things.  Actually, in principle I ought to have a cache of
the source packages associated with all binary packages I distribute,
although for one-offs I usually assume I can get it from
snapshot.debian.net if I need it.  (snapshot has saved my bacon more
than once -- thank you Ukai-san and FSIJ!)

If I had Ubuntu's resources, I'd handle it differently.  Relying on
people (or even an automated process) to touch up debian/control and
debian/changelog on rebuild is so 1990's.  A Debian upload isn't
acceptable without a signed changes file, and an autobuilt package
doesn't make it onto ftpmaster without a signed buildd log (as I
understand it, anyway).  Soon it will be practical to install only
signed binary packages (what gets signed in apt 0.6, actually?
md5sums?) on a Debian / Debian-derived system.  I would like to see
all binary packages accompanied by information equivalent to the
contents of a changes file, signed in a way that allows bug reporting
tools to check the chain of trust and choose a bug report destination
accordingly.

I believe that the right way to handle this (no, I don't have code in
my back pocket -- yet) is to use a token for package integrity that
can be multiply signed, and on which those signatures can be revoked,
so that an organization can easily delegate release engineering /
update tracking to an internal guru or a consultant they trust, or
spread it across multiple roles and automated processes.  These
integrity tokens should be distributed using a mechanism that makes it
easy to check the current signature set of the token and to add and
revoke signatures in any order, and this mechanism should be proven to
scale to millions of tokens with thousands of signatures on each.

Stating it this way should make it obvious that I have in mind using
single-use GPG keys as integrity tokens and distributing them with a
network of keyservers.  (Not, obviously, the public keyservers, on
which keys that represent things rather than people have no place.) 
Single-use keys would be generated at the conclusion of the package
build cycle, similar to a changes file except one per .deb.  The
sha1sums of the .deb and .dsc would appear in the key's userid, and
full vital data for the binary package, others built in the same
dpkg-buildpackage run, and the source from which it was built go in,
say, the Notation field for the self-signature.

The sha1sum of a .deb can thus be used to look up sha1sums for its
parent source and its sibling .debs, and given an sha1sum index to
snapshot.debian.net or the Debian derivative's equivalent, the
single-use public key is a reliable clue to fetch the packages
themselves.  The sha1sum of the .dsc is also in the userid to make it
easy to find other binary packages built from the same source, which
facilitates use cases like the "M out of N security experts" mentioned
below, in which some roleplayers' auditing of packages built from the
same source is "good enough".

Once the public half of the key is self-signed with vital data
embedded, the private half is discarded, and the public half is
uploaded to the package keyserver network.  Thereafter, its primary
function is to accumulate signatures (and revocations), which
represent the audit trail through whatever processes, human and
automated, anyone who cares to use the package sees fit.  (Note that
the key isn't used to sign anything but itself, and the sha1sums in
its userid make leakage of the private key harmless except for
possibly tampering with the self-signature on the public half, which
isn't that big a deal anyway; see below.)

A sysadmin for a large network of machines might have an automated
regression test setup that pulls the package as soon as the build is
done, before any human bothers to audit it; that system can autosign
with a machine-level key, which is signed by the sysadmin; the
sysadmin's signature can be revoked later if it is discovered that the
system was compromised in some way.  A paranoid organization might
want to security audit source code changes (presumably also running
their own autobuilder), and would require that the signatures of M out
of N known security experts (and no revocations from the other N-M) be
present on keys with matching source package sha1sums before a
production machine will install that package.  (This obviously
involves more complex lookups than just "chain of trust", but the
keyserver has the necessary data in an easily gathered form.)  And so
forth.

Role-level keys and/or signature notations can specify what aspect of
the package's integrity is being signed off on.  A "Report-Bugs-To"
key/notation could guide bug tracking tools in selecting the
appropriate destinations; a Debian maintainer who is interested in
getting bug reports for the Ubuntu versions of his/her packages could
sign a role key stewarded by an Ubuntu person (authorized to feed it
to a system that signs autobuilder output at a specified stage of QA),
and if he/she decides that the Ubuntu package has diverged too much,
the signature on the role key can be revoked.  Signature-aware bug
reporting tools would automatically pick up the current set of
appropriate recipients by using the sha1sum of the package (stashed by
dpkg at install time) to fetch its integrity token and its chain of
trust.

Note that there is no need for an additional data signature covering
the .deb, or its md5sums, either as a separate file or as an appendix
to the .deb; the tokens can of course be put on install CDs for
convenience of installation, but you really want integrity tokens, and
the keys used to sign them, hot off the keyserver most of the time. 
And now that it's got the sha1sum in the userid (which I left out the
last time I suggested something like this), I see no reason to object
to the single-use key as a form of "detached signature", because the
token -> deb mapping is practically impossible to subvert.

When people or auditing tools sign the single-use key, what they're
really approving is the sha1sum in its userid.  So the worst that an
attacker could do (assuming that SHA1 is not incredibly broken) is to
substitute a key with the same userid (and hence the same sha1sum) but
different data in the self-signature notation.  But presumably one of
the "role" signatures I require in my trust analysis is the
autobuilder maintainer's (signed in turn by an ftpmaster), and that's
my guarantee that the auxiliary data in the self-signature is correct
(and that the private key was discarded so that the self-signature
can't be tampered with).

Is this in some sense an abuse of GPG/PGP and keyservers?  Not any
more than serving HTTP/1.0 via apache run from inetd is an abuse of
TCP and inetd.  Yes, the use cases for which PGP/GPG and keyservers
were conceived and designed involved long-lived keys that represented
real people, just as TCP and inetd were conceived for long telnet/FTP
sessions.  But there's no particular reason not to use the same robust
design and mature implementations for a different set of
application-level use cases, as long as you use a different port.

Implementations welcome.  :-)  Given the number and variety of urgent
things at the day job, and the amount of time and attention that I
have found my nine-month-old daughter needs, it looks like it may take
me a while longer to create enough breathing room to get around to it
myself.

Cheers,
- Michael



Reply to: