Reproducibility of image building (Re: Debian images on Microsoft Azure cloud)

Hi Marcin and everybody,

about reproducibility:

Le Sat, Nov 21, 2015 at 03:17:22PM +0000, Marcin Kulisz a écrit :
> I'm not sure if it's possible to upload image and to build one to make them bit
> for bit identical for reasons like ex. timestamps on files, etc.. I think that
> at least some providers are adding some metadate which would change any
> checksums produced before upload.


In this discussion and before, I think that there is a strong consensus that
there must be some reproducibility in image building, but we have a difficulty
of translating this in a concrete requirement.

Requiring that two images built at different times are bitwise identical is not
realistic, not only because of time stamps, but also because some elements of
configuration will differ, for instance the location of the package sources.

Having checksums of all the files on a given image would be nice, but let's
note that this is not a requirement currently.  At the moment, I think that we
should not request that the file checksums stay identical over rebuilds in the
same environments: this would restrict design choices for the image builders
(on timesamps, logs, etc), and therefore put pressure on the people writing

Of course, some of these goals can become standard practice later, but I think
that this should evolve through consensus involving the people and teams
developing image builders.  Doing the other way round would be hitting those
who do the work with a trademark stick, which would be counter productive, so
put it mildly.

Altogether, for reproducibility, would the following be acceptable ?
(Wording, of course, can be improved)

 * When building an image twice in a row with the same package source
   and parameters:
   - the packages installed must be the same;
   - the files created must be the same;
   - the content of the files created may differ;

 * When releasing an image, a list of all the packages installed and a list of
   checksums of all the files must be provided.

 * For files which checksums vary, it would be good to provide their list
   and an explanation on why they vary, although it is not a stict requirement.

Have a nice day,

Charles Plessy
Tsurumi, Kanagawa, Japan

