Hi Christian,
after swapping the CPU on ci-worker-ckk02 yesterday, * the gfx1030 card was not properly connected (my fault) * the gfx1034 card was assigned the PCI slot ID normally assigned to to the aforementioned gfx1030, so the test gfx1030 results are bogus Consequently, I will reschedule all tests for gfx1030 and gfx1034.
It might be a good idea for the CI ensure that hardware with the expected architecture is available before running the test. I used the wrong pcie id in one of my configuration files, which resulted in gfx1032 hardware being used for gfx1012 tests. There is a gfx_target_version listed in /sys/class/kfd/kfd/topology/nodes/*/properties that can be checked.
I wonder whether we shouldn't keep a public log of all infrastructure changes somewhere.
I've been meaning to create a wiki page with a list of the machines used for the ROCm Team CI and their hardware configuration. That wouldn't be a log per se, but it would be a start.
Sincerely,
Cory Bloor