[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm CI: Now also triggering on dependency changes



Hi Christian,

On 2023-12-03 02:59, Christian Kastner wrote:
after swapping the CPU on ci-worker-ckk02 yesterday,
  * the gfx1030 card was not properly connected (my fault)
  * the gfx1034 card was assigned the PCI slot ID normally assigned to
    to the aforementioned gfx1030, so the test gfx1030 results are bogus

Consequently, I will reschedule all tests for gfx1030 and gfx1034.

It might be a good idea for the CI ensure that hardware with the expected architecture is available before running the test. I used the wrong pcie id in one of my configuration files, which resulted in gfx1032 hardware being used for gfx1012 tests. There is a gfx_target_version listed in /sys/class/kfd/kfd/topology/nodes/*/properties that can be checked.

I wonder whether we shouldn't keep a public log of all infrastructure
changes somewhere.

I've been meaning to create a wiki page with a list of the machines used for the ROCm Team CI and their hardware configuration. That wouldn't be a log per se, but it would be a start.

Sincerely,
Cory Bloor


Reply to: