Re: ROCm CI: Now also triggering on dependency changes

To: Christian Kastner <ckk@kvr.at>
Cc: debian-ai@lists.debian.org
Subject: Re: ROCm CI: Now also triggering on dependency changes
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Sun, 3 Dec 2023 23:50:22 -0700
Message-id: <[🔎] 2911ce01-89a9-c7ee-5357-10fa9d742121@slerp.xyz>
In-reply-to: <[🔎] 6262f7ce29784efbf7c02b0359485827@kvr.at>
References: <f5c88db4-6622-4f1b-8598-7f6ed2491606@debian.org> <[🔎] 9c235cc1-e4d9-45d4-9119-2face31fff30@debian.org> <[🔎] 457794a3-c349-792a-dba5-e848df44834f@slerp.xyz> <[🔎] 6262f7ce29784efbf7c02b0359485827@kvr.at>

Hi Christian,

Ah. The fact that the guest doesn't know the target architecture doesmake it a little trickier. I think you're spot-on about the solutions,though.


On 2023-12-03 12:06, Christian Kastner wrote:

Alternatively, since the host always knows PCI device:vendor IDs (even
with pass-through) for each slot, I could fully map all possible
device:vendor IDs to gfxNNNN, and check that in the qemu+rocm backend.

I know of two lists that could be used for that purpose [1][2]. I'm notsure which one is better.

Perhaps you could emit a tmpfail when the device being passed through isknown to be the wrong architecture, but merely emit a warning when thedevice architecture is unknown (i.e., do not emit an error)? That wouldprovide a safety net for common errors without getting in the way whennew hardware is added. Or, maybe default to tmpfail for unknownhardware, but add a configuration option to disable the check.

I've been meaning to create a wiki page with a list of the machines
used for the ROCm Team CI and their hardware configuration. That
wouldn't be a log per se, but it would be a start.

That would be great and I'd add my info to that. This would also be
valuable to package maintainers, I think.


I have created the page and filled in my info [3].

Sincerely,
Cory Bloor

[1]:https://github.com/RadeonOpenCompute/rocminfo/blob/rocm-5.7.1/rocm_agent_enumerator#L12-L75[2]:https://github.com/ROCmSoftwarePlatform/hipfort/blob/rocm-5.7.1/bin/gputable.txt[3]:https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Continuous-integration-workers

Reply to:

Follow-Ups:
- Re: ROCm CI: Now also triggering on dependency changes
  - From: Christian Kastner <ckk@kvr.at>

References:
- Re: ROCm CI: Now also triggering on dependency changes
  - From: Christian Kastner <ckk@debian.org>
- Re: ROCm CI: Now also triggering on dependency changes
  - From: Cordell Bloor <cgmb@slerp.xyz>
- Re: ROCm CI: Now also triggering on dependency changes
  - From: Christian Kastner <ckk@kvr.at>

Prev by Date: Re: ROCm CI: Now also triggering on dependency changes
Next by Date: Bug#1057251: librocfft0-tests: nondeterministic failures in random_real_3d/random_params.vs_fftw
Previous by thread: Re: ROCm CI: Now also triggering on dependency changes
Next by thread: Re: ROCm CI: Now also triggering on dependency changes
Index(es):
- Date
- Thread