[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#786644: reproducible builds should vary whether nocheck is added to DEB_BUILD_OPTIONS



Hi Guillem,

> > > I'd expect that setting DEB_BUILD_OPTIONS=nocheck on a package build
> > > should not change the resulting binary packages. It might make the build
> > > succeed despite being broken, but if it succeeds without nocheck, it
> > > should be no different when enabled.
> > 
> > Policy is, however, silent on whether that is the correct behaviour or
> > not.
> 
> Policy is silent on many aspects of the distribution, in many cases
> because they are obviously correct or buggy. Here I'd say the former.

I would disagree that it is obvious that nocheck should not change the 
contents of the package. I think it's entirely reasonable that test results or 
test outputs could be shipped as examples or in the documentation. I think 
it's entirely reasonable that nocheck would thus cause a difference in the 
package.

Other DEB_BUILD_OPTIONS are supposed to cause the resultant .deb to be 
different (nostrip and noopt) so it's not true that the package contents have 
to be invariant on DEB_BUILD_OPTIONS. I can even conceive of situations (such 
as parallel compressors) where parallel=n could change the output.

> > Clarifying policy as to what the correct behaviour should be seems to be a
> > necessary first step.
> 
> I don't see why, in this specific case less so when the request is
> being done on a feature (reproducibility) that is neither in policy.

Precisely because reproducibility is not mandated in policy or even encouraged 
in devref, I think it is important to only test variations that are commonly 
agreed-upon as being worth being robust to. Agreement can be evidenced by 
discussion on mailing lists or by codifying in Policy.

So far, there is broad agreement that builds should be robust to when the 
build was started, the path to the build, the user that runs the build etc. 
Rapid progress has been made in improving the robustness against variations on 
these things with the cooperation of many maintainers precisely because they 
are widely agreed.

The counterpoint would be that declaring a package irreproducible because 
(say) the outputs from gcc-4.8 and gcc-4.9 were different. Such an 
irreproducibility label would not be widely supported. It would certainly be 
possible to create a test harness that rebuilt packages twice with such 
different gcc versions; someone could file a wishlist bug that reproducible.d.n 
do this and bugs could be filed against offending packages asking for 
cooperation in modifying them to become reproducible to this scenario. Would 
anyone modify their packages to deal with such reproducibility problems? Of 
course not. Would adding a gcc-version-test like this harm the wider 
reproducibility effort? Most likely. If the reproducibility effort gets a 
reputation for pushing things for which there is not widespread agreement or 
reporting problems that others consider to be false positives for 
irreproducibility, then the reproducibility effort will be doomed to fail, and 
that would be disappointing. [1]

My conjecture is that "nocheck" is not obviously something to be robust 
against (like a timestamp variation) nor is it obviously something to dismiss 
as outlandish (like a gcc variation). Given that it is somewhere in between, 
finding out if there is a consensus for what nocheck should do seems wise 
before labelling as reproducible vs irreproducible.

> > That said, the policy editors are often interested in seeing scope of the
> > impact of any change and the only way of knowing how many packages would
> > be
> > made instabuggy by this change is to include it in the tests...
> 
> The check does not impose anything on package maintainers, like
> migration blocking or similar. So even if there was a substantial
> amount of packages that would fail that test, it would still be useful
> information for the reproducible effort IMO.

Except that reproducibility is boolean valued and this boolean is exposed to 
maintainers. There is either a nice little tick on the maintainer's QA page or 
there is a nasty little cross. If nasty little crosses come from what is 
considered to be poor tests generating incorrect results, history tells us 
that the entire column will be ignored. That makes including tests for which 
there is not broad agreement a net loss for each maintainer, for 
reproducible.d.n, and for the project.

cheers
Stuart


[1] historical footnote: 10ish years ago, lintian had a reputation for 
producing what were considered by many maintainers to be spurious complaints. 
The result was that they didn't even bother running it at all and so would 
also not see its legitimate complaints. I recall experienced maintainers at 
the time telling me that lintian was only suitable for hello world packages 
not for anything real. The signal:noise in that QA tool was poor and so it 
became devalued; it took many years of hard work to break down that 
reputation.


-- 
Stuart Prescott    http://www.nanonanonano.net/   stuart@nanonanonano.net
Debian Developer   http://www.debian.org/         stuart@debian.org
GPG fingerprint    90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7


Reply to: