[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm CI: Now also triggering on dependency changes



Hey Cory,

On 2023-12-03 19:31, Cordell Bloor wrote:
> It might be a good idea for the CI ensure that hardware with the
> expected architecture is available before running the test. I used the
> wrong pcie id in one of my configuration files, which resulted in
> gfx1032 hardware being used for gfx1012 tests. There is a
> gfx_target_version listed in
> /sys/class/kfd/kfd/topology/nodes/*/properties that can be checked.

That's a good idea. The challenge is how to check this. With QEMU
workers, /sys/class/kfd only exists in the guest, but the test itself
doesn't "know" which architecture is being targeted; that happens on the
host with debci listening to a particular queue.

I guess one could pass an environment variable from host to guest, to be
consumed from autopkgtest script the guest. But that would mean the test
is run, and therefore must end with "skipped" or "error".

Alternatively, since the host always knows PCI device:vendor IDs (even
with pass-through) for each slot, I could fully map all possible
device:vendor IDs to gfxNNNN, and check that in the qemu+rocm backend.

>> I wonder whether we shouldn't keep a public log of all
>> infrastructure
>> changes somewhere.
> 
> I've been meaning to create a wiki page with a list of the machines
> used for the ROCm Team CI and their hardware configuration. That
> wouldn't be a log per se, but it would be a start.

That would be great and I'd add my info to that. This would also be
valuable to package maintainers, I think.

Best,
Christian


Reply to: