[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Including build metadata in packages



A while ago I noticed binutils had some embedded logs in one of it's
packages, which included timing information about the test suite runs
which will almost certainly have differences between the different
builds, even on the exact same machine:

  https://bugs.debian.org/950585

My proposed patch removed the timing information and various other
things, but was exactly the information wanted from these files, so was
not an appropriate patch.


It also became known that other key toolchain packages (e.g. gcc) also
embed similar log files in the .deb packages... I have since found a few
other packages that do similar things:

  https://tests.reproducible-builds.org/debian/issues/unstable/test_suite_logs_issue.html


Obviously, this would interfere with any meaningful reproducible builds
testing for any package that did something like this. Ideally metadata
like this about a build should *not* be included in the .deb files
themselves.


I'll try to summarize and detail a bit some of the proposed strategies
for resolving this issue:


* output plaintext data to the build log

Some of these log files are large (>13MB? per architecture, per package
build) and would greatly benefit from compression...

How large is too large for this approach to work?

Relatively simple to implement (at least for plain text logs), but
potentially stores a lot of data on the buildd infrastructure...


* Selectively filter out known unreproducible files

This adds complexity to the process of verification; you can't beat the
simplicty of comparing checksums on two .deb files.

With increased complexity comes increased opportunity for errors, as
well as maintenance overhead.

RPM packages, for example, embed signatures in the packages, and these
need to be excluded for comparison.

I vaguely recall at least one case where attempting something like this
in the past and resulting in packages incorrectly being reported as
reproducible when the filter was overly broad...

Some nasty corner cases probably lurk down this approach...


* Split build metadata into a separate .deb file

Some of the similar problems of the previous, though maybe a little
easier to get a reliable exclusion pattern? Wouldn't require huge
toolchain changes.

I would expect that such packages be not actually dependend on by any
other packages, and *only* contain build metadata. Maybe named
SOURCEPACKAGE-buildmetadata-unreproducible.deb ... or.... ?

Not beautiful or elegant, but maybe actually achievable for bookworm
release cycle?


* Split build metadata into a separate file or archive

Some of the debian-installer packages generate tarballs that are not
.deb files and are included in the .changes files when uploading to the
archive; making a similar generalized option for other packages to put
build metadata into a separate artifact might be workable approach,
although this would presumably require toolchain changes in dpkg and dak
at the very least, and might take a couple release cycles, which
is... well, debian.

The possibility of bundling up .buildinfo files into this metadata too,
while taking some changes in relevent dpkg, dak, etc. tooling, might in
the long term be worth exploring.

There was a relevent bug report in launchpad:

  https://bugs.launchpad.net/launchpad/+bug/1845159

This seems like the best long-term approach, but pretty much *only* a
long-term approach...


I'd really like to remove this hurdle to reproducible builds from some
key packages like binutils and gcc, but also curious about a
generalizable approach so each package needing something like this
doesn't reinvent the wheel in incompatible ways...


Curious to hear your thoughts!


live well,
  vagrant

p.s. please consider CCing me and/or
reproducible-builds@lists.alioth.debian.org, as I'm not subscribed to
debian-devel.

Attachment: signature.asc
Description: PGP signature


Reply to: