Re: ROCm and virtual machines?

To: debian-ai@lists.debian.org
Subject: Re: ROCm and virtual machines?
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Mon, 12 Feb 2024 01:50:29 -0700
Message-id: <[🔎] 1b2f7c43-262b-8e9c-8d35-e2d51cac850f@slerp.xyz>
In-reply-to: <[🔎] sa6il2u2k2a.fsf@hjemme.reinholdtsen.name>
References: <[🔎] sa6il2u2k2a.fsf@hjemme.reinholdtsen.name>

Hi Petter,

On 2024-02-12 00:55, Petter Reinholdtsen wrote:

I spoke with a college the other day, about maintaining a GPU high
performance computing cluster, and he mentioned that an important
feature needed is the ability to partition a single GPU into across
several VMs, and to get accounting information on the GPU usage to know
how much the allocated GPU is used by a given VM.  He told me this work
ok-ish with NVidia, with a new enough OpenStack version, but he was not
aware of any other GPU vendors providing the same feature.

Anyone know if ROCm from AMD provide something like this?  Can the GPU
resources/cores be split into several partitions and handed out to
several VMs?  Is there a 'top' like program (or better yet, something
non-interactive) that can show how much of such GPU partition is or has
been used bu a given VM?

This is not my area of expertise, but the ROCm documentation mentionsthat VMWare ESXI GPU virtualization is supported on MI200 GPUs [1].

There is also MxGPU support (based on SR-IOV) on a handful of older datacentre cards like the Radeon PRO V620 (Navi 21), Radeon PRO V520 (Navi12), and MI25 (Vega 10) [2]. However, I'm a bit unclear on thedifference between that and the virtualization support in the MI200 line.


Overall, support for GPU virtualization seems to be limited.

Sincerely,
Cory Bloor

[1]:https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.2/reference/system-requirements.html#virtualization-support

[2]: https://www.amd.com/en/graphics/workstation-virtual-graphics

Reply to:

Follow-Ups:
- Re: ROCm and virtual machines?
  - From: Petter Reinholdtsen <pere@hungry.com>

References:
- ROCm and virtual machines?
  - From: Petter Reinholdtsen <pere@hungry.com>

Prev by Date: ROCm and virtual machines?
Next by Date: Re: ROCm CI for OpenCL packages?
Previous by thread: ROCm and virtual machines?
Next by thread: Re: ROCm and virtual machines?
Index(es):
- Date
- Thread