Hi Christian,
Thank you for the work you've done on the scheduler and the Debian ROCm CI. In particular, the podman backend you added recently has allowed us to add reliable gfx803, gfx1010 and gfx1035 machines, some of which have helped us to catch and fix serious regressions during the ROCm 5.7.1 update (e.g. #1065410).
I don't think there's anywhere else where you can publicly view how a library like rocblas [1] or rocrand [2] runs on so many different GPU architectures. There's still plenty of work to do to further improve the CI system, but it's hard to understate how valuable it has been already.
I pushed the following changes today: * debci-scheduler has a slightly better config file, no longer requiring manual hacks * Changes to linux-signed-amd64 now trigger tests again, and this was newly enabled for firmware-amd-graphics, too
I saw something odd after I uploaded a fix for rocprim on gfx1031
in rocprim. The tests were triggered by the upload of rocprim
5.7.1-2~exp2, but they installed librocprim-tests amd64
5.7.1-2~exp1 [3]. This doesn't seem to be a new phenomena, as it
seems that 5.7.1-1 was tested when 5.7.1-2~exp1 was uploaded last
week.
I also notice that the rocthrust tests were triggered despite
that they do not depend on any rocprim package at run-time [4]. I
suppose the scheduler needs to trace the dependency tree via the
source packages for reasons you've discussed in the past, but it's
a good case study to highlight the value of further improvements
to the scheduler.
Sincerely,
Cory Bloor
[1]: https://ci.rocm.debian.net/packages/r/rocblas/
[2]: https://ci.rocm.debian.net/packages/r/rocrand/
[3]:
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1031/r/rocprim/10204/log.gz
[4]:
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1031/r/rocthrust/10216/log.gz