Re: Reproducibility of image building (Re: Debian images on Microsoft Azure cloud)
I think the content of the files created should not differ when building
an image twice in a row with the same package source and parameters.
The packaged timestamps problem needs to not include timestamps or reset
them to 00:00:00... or any other calculated value.
El 23/11/15 a les 02:04, Charles Plessy ha escrit:
> Hi Marcin and everybody,
>
> about reproducibility:
>
> Le Sat, Nov 21, 2015 at 03:17:22PM +0000, Marcin Kulisz a écrit :
>>
>> I'm not sure if it's possible to upload image and to build one to make them bit
>> for bit identical for reasons like ex. timestamps on files, etc.. I think that
>> at least some providers are adding some metadate which would change any
>> checksums produced before upload.
>
> Indeed.
>
> In this discussion and before, I think that there is a strong consensus that
> there must be some reproducibility in image building, but we have a difficulty
> of translating this in a concrete requirement.
>
> Requiring that two images built at different times are bitwise identical is not
> realistic, not only because of time stamps, but also because some elements of
> configuration will differ, for instance the location of the package sources.
>
> Having checksums of all the files on a given image would be nice, but let's
> note that this is not a requirement currently. At the moment, I think that we
> should not request that the file checksums stay identical over rebuilds in the
> same environments: this would restrict design choices for the image builders
> (on timesamps, logs, etc), and therefore put pressure on the people writing
> them.
>
> Of course, some of these goals can become standard practice later, but I think
> that this should evolve through consensus involving the people and teams
> developing image builders. Doing the other way round would be hitting those
> who do the work with a trademark stick, which would be counter productive, so
> put it mildly.
>
> Altogether, for reproducibility, would the following be acceptable ?
> (Wording, of course, can be improved)
>
> * When building an image twice in a row with the same package source
> and parameters:
> - the packages installed must be the same;
> - the files created must be the same;
> - the content of the files created may differ;
>
> * When releasing an image, a list of all the packages installed and a list of
> checksums of all the files must be provided.
>
> * For files which checksums vary, it would be good to provide their list
> and an explanation on why they vary, although it is not a stict requirement.
>
> Have a nice day,
>
Reply to: