[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm and virtual machines?



Hi Petter,

On 2024-02-12 00:55, Petter Reinholdtsen wrote:
I spoke with a college the other day, about maintaining a GPU high
performance computing cluster, and he mentioned that an important
feature needed is the ability to partition a single GPU into across
several VMs, and to get accounting information on the GPU usage to know
how much the allocated GPU is used by a given VM.  He told me this work
ok-ish with NVidia, with a new enough OpenStack version, but he was not
aware of any other GPU vendors providing the same feature.

Anyone know if ROCm from AMD provide something like this?  Can the GPU
resources/cores be split into several partitions and handed out to
several VMs?  Is there a 'top' like program (or better yet, something
non-interactive) that can show how much of such GPU partition is or has
been used bu a given VM?

This is not my area of expertise, but the ROCm documentation mentions that VMWare ESXI GPU virtualization is supported on MI200 GPUs [1].

There is also MxGPU support (based on SR-IOV) on a handful of older data centre cards like the Radeon PRO V620 (Navi 21), Radeon PRO V520 (Navi 12), and MI25 (Vega 10) [2]. However, I'm a bit unclear on the difference between that and the virtualization support in the MI200 line.

Overall, support for GPU virtualization seems to be limited.

Sincerely,
Cory Bloor

[1]: https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.2/reference/system-requirements.html#virtualization-support
[2]: https://www.amd.com/en/graphics/workstation-virtual-graphics


Reply to: