Bug#932795: How to handle FTBFS bugs in release architectures
Hi,
as I'm not sure everyone is aware of what the problem with p4est was, I
decided to write a short summary:
----- Summary ----------------------------------------------------------
- p4est uses MPI, a standard for parallel applications running on single
machines up to clusters with 100 000+ cores.
For MPI applications, it is usually a configuration error (resulting
in degraded performance) to try to use more cores than are available.
To avoid this, Debian's current default MPI implementation, OpenMPI,
by default gives an error when trying to do so:
`mpirun -np 64 echo Blubb`
will give an error when less than 64 cores are available.
One can tell OpenMPI to allow this as was done in p4est 2.2-1 by
setting an environment variable when running the tests[1]. This is
the same other programs using MPI do to run tests during builds in
Debian; a similar environment variable needs to be set to not give an
error when rsh/ssh is not available in $PATH.
[1]: https://salsa.debian.org/science-team/p4est/commit/8803bea251517354c4cbe737e018fdf73cb27278
- For this particular class of bugs (MPI error during build due to
oversubscription), I don't think there is any disagreement that these
are bugs. They have been fixed in various Debian packages by setting
the environment variable mentioned above.
- The only disagreement here is about the severity of the bug report.
Santiago Vila (submitter) wants them to be release-critical; Adrian
Bunk disagrees as he considers single-core build environments to be so
rare that a build failure in such an environment should not be
release-critical (just as build failures due to lack of RAM, disk
space in the same environment).
- (The maintainers of p4est have said nothing about the severity
dispute; neither Santiago nor Adrian maintain the package.)
----- Opinion ----------------------------------------------------------
- As far as I understood, the main argument for making build failures on
single-core systems release critical is that "we are currently forcing
users to spend extra money if they want *assurance* that all the
packages [...] will build"[2].
[2]: https://lists.debian.org/debian-ctte/2019/07/msg00024.html
I do not think this is a good argument: from a quick search any
environment that might give such an assurance will very likely already
have more than a single core. (I assumed one wants 4+ GB RAM; it is
hard to find single-core systems or VMs in the cloud with that.)
The environment used by the submitter already fails such an assurance:
4% or ~1000+ source package already fail to build in it. It seems to
me there is no large practical impact if 10-20 more packages fail to
build in it due to single-core issues. (As far as I understand these
issues are very rare; please correct me if this affects significantly
more packages.)
- Historically the upper bound for resource requirements for builds
(RAM, disk space) has been whatever the current buildds offer. It
seems unlikely we would get to lower boundaries as some packages
usually barely build on the buildds; some already require work arounds
(such as disabling debug information for some environments).
Most packages are of course fine with less resources, but we are
talking about requirements for *all* packages.
- While we could make build failures on single-core systems release
critical, I believe that this is not warranted as the practical
implications of such build failures do not seem very high to me: one
would just build them in an environment suitable for "large" packages.
(Such bugs should of course still be fixed, but that doesn't require
them to be release critical.)
Ansgar
Reply to: