Re: source-only builds and .buildinfo
> On Wed, Jun 21, 2017 at 10:09:00AM +0000, Ximin Luo wrote:
>> Adrian Bunk:
>>> How is that supposed to work when the compiler is not exactly identical?
>>> As an example, gcc-6 6.3.0-18 and gcc-6 6.3.0-19 will likely produce
>>> different output for every non-trivial piece of software.
>>> The reason is that every new gcc upload usually contains whatever
>>> bugfixes are on the upstream branch.
>> It would depend on the situation which dependencies should be "irrelevant" towards the final output, right. If the dependencies are different and the buildinfo is different, it does not necessarily mean there is a problem, the upload does not need to be rejected. But it's a signal that other people (including the uploader) might want to re-try the build with the newer dependencies.
>> OTOH if the outputs match, we get more certainty, which is a good thing.
> "more certainty" on what exactly?
More certainty that the binaries produced, were actually produced from the source code, rather than by malicious or compromised machines.
> "signal that other people might want to" is quite vague,
> what do you want to prove and how exactly should people
> spend time proving it?
That the binaries uploaded were actually produced from the source code. People spend time proving it by running the build against and seeing if the binaries match, possibly also recreating various aspects of previous build environments recorded in other .buildinfo files.
> In the best case [excluding the binary-all special case] we would know that the buildd on the one
> architecture that happens to be used by the person doing the
> source upload produced the same binaries.
> Once you start verifying that all binaries in the archive were built
> from the sources in the archive, this will automatically be covered.
What we'd like to aim for, is to give users some security guarantee *independent of the distributor i.e. Debian or DD uploaders* that the binaries they're using is actually produced from the source code.
One way to give security that is independent of third parties, is to provide some sort of mathematically-verifiable proof. However the world isn't at that stage yet for compiler technology.
Buildinfo files are more like claims rather than proofs. Whilst it can be used as a proof, i.e. by running the build yourself, this is an expensive process which we can't expect most users to do, and doesn't really fit the idea of a "proof" in a security system, which are supposed to be low-cost for verifiers.
For users that can't directly verify everything that they themselves run, one "next best thing" they can do is to check that different parties that they trust - or many parties that they don't trust, that they nevertheless believe are probably not all colluding to attack them - claimed to have performed the build or verified each others' proofs.
So, the more buildinfo files we have, from different parties (DDs, the Debian archive, etc) the better this is for users, because they have more sources of claims. How much they "trust" each individual source, is indeed not something that is concretely measurable and no existing security system tries to model this more precisely unfortunately; however I think we can all agree that "more is better" here.
Therefore, there is still value in using DDs' uploaded buildinfo files, even if the buildds are "likely" to use different dependencies and "likely" to produce different binaries. If they have identical output, great, they get a nice green tick somewhere. If not, people can run the builds again to try to get the identical output. And some builds are indeed not reproducible today, and these indicate bugs rather than builders being compromised.
Besides, I think "non-identical builds due to changed dependencies" won't actually be so likely in practice. For example GCC-6 -18 was there for 3-4 weeks, and plenty of uploads happened during that time. Most DDs would update, build and upload within several minutes or hours of each other.