Bug#932795: How to handle FTBFS bugs in release architectures

To: 932795@bugs.debian.org
Subject: Bug#932795: How to handle FTBFS bugs in release architectures
From: Santiago Vila <sanvila@unex.es>
Date: Fri, 30 Aug 2019 13:22:49 +0200 (CEST)
Message-id: <[🔎] alpine.DEB.2.20.1908301108520.18045@tulipan.isla-invisible.es>
Reply-to: Santiago Vila <sanvila@unex.es>, 932795@bugs.debian.org
In-reply-to: <20190725142733.GA9038@espresso.pseudorandom.co.uk>
References: <alpine.DEB.2.20.1907231343060.14879@tulipan.isla-invisible.es> <alpine.DEB.2.20.1907231343060.14879@tulipan.isla-invisible.es> <20190724110545.GA23749@espresso.pseudorandom.co.uk> <alpine.DEB.2.20.1907231343060.14879@tulipan.isla-invisible.es> <20190725112242.og3gey4fmpwnt3od@nucold> <20190725142733.GA9038@espresso.pseudorandom.co.uk> <alpine.DEB.2.20.1907231343060.14879@tulipan.isla-invisible.es>

Simon McVittie wrote:
> On Thu, 25 Jul 2019 at 13:22:42 +0200, Santiago Vila wrote:
> > The only thing it did not have was more than one CPU, but AFAIK that's
> > not something that may be considered as a misconfiguration.
> 
> Roughly what proportion of Debian packages are failing to build in
> this environment?
> 
> Roughly how many of the failures are failure to compile (like #924325),
> and how many are failing build-time tests and would likely have built
> successfully if you had been using DEB_BUILD_OPTIONS=nocheck
> (like #907829)?

Sorry for the late reply. I was in the process of collecting this info
but I've had real-life issues which forced me to stop for a while.

I still would like to provide an answer to your question, for the sake
of transparency, but I have some mixed feelings about it:

- As I said in the initial report, I should not have to be doing this.
In my opinion, those who want to deprecate building on single-cpu
systems should be the ones to show everybody else that we *need* to do
that, not the other way around. (But again, I still want to provide an
answer for the sake of transparency).

- This is a fuzzy set, not always very well-defined. When a package
fails to build in any of my autobuilders, it is not clear enough that
a single-cpu is the "reason" for the failure (as in the p4est package),
or maybe it's that the package does not build ok if the machine is not
super-fast, or any other reason.

Sometimes the package fails to build randomly on multi-core systems
and almost always on single-cpu systems. Would that be a "FTBFS in
single-CPU systems"? Not in my opinion.

Another example: Are we correctly describing the bug in gcc-8-cross if
we say "fails to build on single-cpu systems"? Not in my opinion.
It is a Makefile bug.

[ BTW: To solve this problem, when I have to report a "weird" FTBFS bug,
  I now offer ssh access to a machine where the failure may be
  reproduced. ].

- I'm also worried about how the answer to the question will be used.
For example: It is good that most of these bugs happen in the build
stage or it is better that most of them happen in the dh_auto_test
stage?

Some people tell me: "Oh, no, we are not deprecating Debian on
single-CPU systems, we are only deprecating it for building".

So, apparently we still fully support *using* Debian on single-cpu
systems, but then: What happens if the tests fail in a single-cpu
system? Can we still claiming that we support *using* the package
on single-cpu systems when the tests fails in such systems?
Seems a big inconsistency to me.

- Same as before: It is good that there are too many bugs, or it is
better that there are only a few? And good for whom? For those who
want to deprecate building on single-cpu or for those who don't?

I'd like to think that we are not going to deprecate this until there
is a *real* need to do so, i.e. when it's unbearable, when the number
of bugs of this type (which again, is a fuzzy set) is "high enough".

But not before.

Ansgar wrote:

> The environment used by the submitter already fails such an assurance:
> 4% or ~1000+ source package already fail to build in it.  It seems to
> me there is no large practical impact if 10-20 more packages fail to
> build in it due to single-core issues.

No, this is not correct. You are probably misinterpreting what I wrote
here:

https://people.debian.org/~sanvila/single-cpu/

This was just an experiment to show that building on single-CPU may be
as cost-efficient as building on multi-core, but those are not the
only autobuilders I had running 24/7 during the last months.

I am actually building all packages, big and small, using either
vendor-provided virtual machines (like the above) or self-hosted KVM
machines (where memory, disk, and CPUs may be chosen at will and
without restrictions).

> [...]
> Most packages are of course fine with less resources, but we are
> talking about requirements for *all* packages.

I'd like we all to be careful with the word "resources".

People have the habit of speaking of "resources" to refer to RAM, disk
and number of CPUs, all in the same bag.

This is not a problem by itself, but it is when we say "your machine
can't build the package because it does not have enough resources"
and the machine has plenty of RAM and plenty of disk. That's tricky.

A package which needs 12 GB of RAM will not build with less, or it
will build 100 times slower swapping all the time. Similarly, a
package which needs 50 GB of disk to build will say "out of disk
space" if we try to build it with less.

On the contrary, a package never really "needs" more than one CPU to
build. The only consequence of building with one CPU should be that it
takes more time, but that's all, and it should be up to the end users
if they want to build the packages faster or not, that should simply
not be our concern.

So please let us stop using "resources" to refer to CPUs if at the
same time we are also using the word in phrases like "not enough
resources to build the package" which do not really refer to RAM or
disk.

Simon McVittie wrote:
> The first is that we aren't really discussing whether it's a bug for
> a package to fail to build on a single-CPU machine - I think everyone
> involved agrees that it is. We're discussing whether it is or should
> be a *release-critical* bug.

Actually, I did not filed this bug to discuss about the RC-ness of a
given bug. If that was the case, I would have filed the bug in
release.debian.org, not in tech-ctte.

My complain is about the *procedure* by which such bugs become non-RC,
i.e. incredibly sloppy, based on "not rc if it builds on buildd.debian.org",
tradition without enough rationale, unwritten rules, not telling the
end-user at all, etc. This is radically different from what we do to
deprecate an architecture.

[ I'll try to answer your counterpoints in a later email ].

Thanks.

Reply to:

Prev by Date: Bug#932795: How to handle FTBFS bugs in release architectures
Previous by thread: Bug#932795: How to handle FTBFS bugs in release architectures
Index(es):
- Date
- Thread