[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reproducibility of image building (Re: Debian images on Microsoft Azure cloud)

> On Mon, Nov 23, 2015 at 2:04 AM Charles Plessy <plessy@debian.org> wrote:
> >
> > Altogether, for reproducibility, would the following be acceptable ?
> > (Wording, of course, can be improved)
> >
> >  * When building an image twice in a row with the same package source
> >    and parameters:
> >    - the packages installed must be the same;
> >    - the files created must be the same;
> >    - the content of the files created may differ;
> >
> >  * When releasing an image, a list of all the packages installed and a
> >    list of checksums of all the files must be provided.
> >
> >  * For files which checksums vary, it would be good to provide their list
> >    and an explanation on why they vary, although it is not a stict
> >    requirement.

Le Mon, Nov 23, 2015 at 08:25:03AM +0000, Anders Ingemann a écrit :
> Remember logfiles, they have the same problem with timestamps. Though tbh
> we do our best with bootstrap-vz to not leave anything behind from the
> bootstrapping process, so it shouldn't be a problem.
> > - the content of the files created may differ;
> I think you mean "may *not* differ"...?

Hi Anders,

I thought about logfiles, and wondered if there would be a demand to keep some
of them, for instance for auditing purposes.

Because of this, of timestamps, and of possibly other sources of variation that
I did not think of, I think that it is good to write black on white that files
*may differ* between two image builds, even if run with the same parameters
from the same archive source.

Pinpointing which ones and why is a bonus, but I hope that some day it will be
part of the standard practice.

Le Mon, Nov 23, 2015 at 04:14:20PM +0100, Thomas Goirand a écrit :
> >  * When releasing an image, a list of all the packages installed and
> >    a list of checksums of all the files must be provided.
> That's a nice idea, but not very useful. I very much prefer what Steve
> has produce: a tarball with all source packages [1], which makes it a
> way more DFSG style. Each individual md5sum of each files in anyway
> stored in /var/lib/dpkg/info/*.md5sums within the image.

Hi Thomas,

/var/lib/dpkg/info/*.md5sums only lists the files that are distributed in the
package, not the ones created by maintainer scripts nor the ones included in
the image for other reasons.  In my understanding, the goal of asking for a
list of checksums is to help detect those files, and help detect which of them
vary between builds, in order to better audit images and ensure that they have
not been added anything that is unnapropriate in Debian: non-free materials,
and of course malware and the like.

Regarding the distribution of sources, indeed it seems to me that some
licenses, in particular the GPv2, will not allow to distribute a cloud image
without the source in the same location, unless impractical measures are taken
("a written offer, valid for at least three years, to give any third party, for
a charge no more than your cost of physically performing source distribution, a
complete machine-readable copy of the corresponding source code", ...).
Nevertheless, I do not see this being respected by other image providers, and
nobody on Earth seemed to formally bother until now (correct me if I am wrong).
I am inclined to think that a Stable image is enough covered by the source
packages in ftp.debian.org and snapshot.debian.org.

Have a nice day,


Charles Plessy
Tsurumi, Kanagawa, Japan

Reply to: