[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: CI: Updates to the scheduler



Hi Christian,

Thank you for the work you've done on the scheduler and the Debian ROCm CI. In particular, the podman backend you added recently has allowed us to add reliable gfx803, gfx1010 and gfx1035 machines, some of which have helped us to catch and fix serious regressions during the ROCm 5.7.1 update (e.g. #1065410).

I don't think there's anywhere else where you can publicly view how a library like rocblas [1] or rocrand [2] runs on so many different GPU architectures. There's still plenty of work to do to further improve the CI system, but it's hard to understate how valuable it has been already.

On 2024-03-24 11:03, Christian Kastner wrote:
I pushed the following changes today:
  * debci-scheduler has a slightly better config file, no longer
    requiring manual hacks
  * Changes to linux-signed-amd64 now trigger tests again, and
    this was newly enabled for firmware-amd-graphics, too

I saw something odd after I uploaded a fix for rocprim on gfx1031 in rocprim. The tests were triggered by the upload of rocprim 5.7.1-2~exp2, but they installed librocprim-tests amd64 5.7.1-2~exp1 [3]. This doesn't seem to be a new phenomena, as it seems that 5.7.1-1 was tested when 5.7.1-2~exp1 was uploaded last week.

I also notice that the rocthrust tests were triggered despite that they do not depend on any rocprim package at run-time [4]. I suppose the scheduler needs to trace the dependency tree via the source packages for reasons you've discussed in the past, but it's a good case study to highlight the value of further improvements to the scheduler.

Sincerely,
Cory Bloor

[1]: https://ci.rocm.debian.net/packages/r/rocblas/
[2]: https://ci.rocm.debian.net/packages/r/rocrand/
[3]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1031/r/rocprim/10204/log.gz
[4]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1031/r/rocthrust/10216/log.gz


Reply to: