debci 3.10+rocm4 uploaded: New gpuenv-aware execution driver
Hi,
I've released a new version of our fork of debci to our APT repo [1].
This version ships with a new gpuenv-aware execution driver for
worker instances. When activated, worker instances no longer need
exclusive access to their GPUs, and thus GPUs can be shared over an
arbitrary number of instances. Workers will simply wait until their
GPU(s) are ready for use before executing a job.
In our fork, we previously added an optional pre-test hook to the
original execution driver so that it can abort testing (rather than
cascading tmpfails) when certain error cases are encountered, like a
busy or unresponsive GPU. This new execution driver works around those
cases.
This is a pretty major change, and it is not enabled by default. It's
being published now so that it is available for broader testing.
Best,
Christian
[1]: https://apt.rocm.debian.net
Reply to: