[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm test launcher modifications when there is other non-AMD GPU cards.



Hi Christian,

On 2025-05-12 10:54, Christian Kastner wrote:
[3]: Sadly, after much trying, it seems that the analog for [1] in
      rootless containers, using the 'podman+rocm' backend, is not
      possible due to come cgroupsv2 restriction. However, I still
      have the code for that, and I guess I could ship it for people who
      want to try it in rootful containers.

I take it you are referring to setting environment variables in podman workers? The ROCR_VISIBLE_DEVICES variable can isolate the GPU at a fairly low level in the ROCm user land [2]. Or, do you mean only passing through a subset of devices at all? I forget how you were approaching this.

In any case, isolation via rooted containers would probably be useful as an option. I'd like to limit Pinwheel and Arctophylax to a single GPU [3]. They're getting a fair bit of interactive use now and that would make it easier to share them. It's up to you, though.

Sincerely,
Cory Bloor

[2]: https://rocm.docs.amd.com/en/docs-6.4.0/conceptual/gpu-isolation.html
[3]: https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Continuous-integration-workers


Reply to: