[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#932795: How to handle FTBFS bugs in release architectures



Hi,

as I'm not sure everyone is aware of what the problem with p4est was, I
decided to write a short summary:

----- Summary ----------------------------------------------------------

- p4est uses MPI, a standard for parallel applications running on single
  machines up to clusters with 100 000+ cores.

  For MPI applications, it is usually a configuration error (resulting
  in degraded performance) to try to use more cores than are available.
  To avoid this, Debian's current default MPI implementation, OpenMPI,
  by default gives an error when trying to do so:
    `mpirun -np 64 echo Blubb`
  will give an error when less than 64 cores are available.

  One can tell OpenMPI to allow this as was done in p4est 2.2-1 by
  setting an environment variable when running the tests[1].  This is
  the same other programs using MPI do to run tests during builds in
  Debian; a similar environment variable needs to be set to not give an
  error when rsh/ssh is not available in $PATH.

    [1]: https://salsa.debian.org/science-team/p4est/commit/8803bea251517354c4cbe737e018fdf73cb27278

- For this particular class of bugs (MPI error during build due to
  oversubscription), I don't think there is any disagreement that these
  are bugs.  They have been fixed in various Debian packages by setting
  the environment variable mentioned above.

- The only disagreement here is about the severity of the bug report.
  Santiago Vila (submitter) wants them to be release-critical; Adrian
  Bunk disagrees as he considers single-core build environments to be so
  rare that a build failure in such an environment should not be
  release-critical (just as build failures due to lack of RAM, disk
  space in the same environment).

- (The maintainers of p4est have said nothing about the severity
  dispute; neither Santiago nor Adrian maintain the package.)

----- Opinion ----------------------------------------------------------

- As far as I understood, the main argument for making build failures on
  single-core systems release critical is that "we are currently forcing
  users to spend extra money if they want *assurance* that all the
  packages [...] will build"[2].

    [2]: https://lists.debian.org/debian-ctte/2019/07/msg00024.html

  I do not think this is a good argument: from a quick search any
  environment that might give such an assurance will very likely already
  have more than a single core.  (I assumed one wants 4+ GB RAM; it is
  hard to find single-core systems or VMs in the cloud with that.)

  The environment used by the submitter already fails such an assurance:
  4% or ~1000+ source package already fail to build in it.  It seems to
  me there is no large practical impact if 10-20 more packages fail to
  build in it due to single-core issues.  (As far as I understand these
  issues are very rare; please correct me if this affects significantly
  more packages.)

- Historically the upper bound for resource requirements for builds
  (RAM, disk space) has been whatever the current buildds offer.  It
  seems unlikely we would get to lower boundaries as some packages
  usually barely build on the buildds; some already require work arounds
  (such as disabling debug information for some environments).

  Most packages are of course fine with less resources, but we are
  talking about requirements for *all* packages.

- While we could make build failures on single-core systems release
  critical, I believe that this is not warranted as the practical
  implications of such build failures do not seem very high to me: one
  would just build them in an environment suitable for "large" packages.
  (Such bugs should of course still be fixed, but that doesn't require
  them to be release critical.)

Ansgar


Reply to: