[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm CI: Now also triggering on dependency changes



Hi Christian,

Ah. The fact that the guest doesn't know the target architecture does make it a little trickier. I think you're spot-on about the solutions, though.

On 2023-12-03 12:06, Christian Kastner wrote:
Alternatively, since the host always knows PCI device:vendor IDs (even
with pass-through) for each slot, I could fully map all possible
device:vendor IDs to gfxNNNN, and check that in the qemu+rocm backend.

I know of two lists that could be used for that purpose [1][2]. I'm not sure which one is better.

Perhaps you could emit a tmpfail when the device being passed through is known to be the wrong architecture, but merely emit a warning when the device architecture is unknown (i.e., do not emit an error)? That would provide a safety net for common errors without getting in the way when new hardware is added. Or, maybe default to tmpfail for unknown hardware, but add a configuration option to disable the check.

I've been meaning to create a wiki page with a list of the machines
used for the ROCm Team CI and their hardware configuration. That
wouldn't be a log per se, but it would be a start.
That would be great and I'd add my info to that. This would also be
valuable to package maintainers, I think.

I have created the page and filled in my info [3].

Sincerely,
Cory Bloor

[1]: https://github.com/RadeonOpenCompute/rocminfo/blob/rocm-5.7.1/rocm_agent_enumerator#L12-L75 [2]: https://github.com/ROCmSoftwarePlatform/hipfort/blob/rocm-5.7.1/bin/gputable.txt [3]: https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Continuous-integration-workers


Reply to: