[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ROCm on APUs




On 2024-04-02 16:17, Cordell Bloor wrote:
The ROCm packages for Debian are built such that they can run on AMD APUs, however, there is a major limitation. Integrated GPUs are often configured with a relatively small amount of initial memory dedicated to the GPU. The APU expects that the memory reserved for the GPU will be adjusted dynamically. Unfortunately, HIP applications will not automatically request more memory to be assigned to the GPU and are therefore stuck with the default allocation.

Carlos Segura has an interesting workaround [1]. Using LD_PRELOAD, they replace hipMalloc / hipFree with hipHostMalloc / hipHostFree to force all device memory allocations to use pinned host memory instead.

After raising the topic here, I briefly discussed this workaround with a number of folks, including Felix Kuehling. He pointed out that replacing hipMalloc with hipHostMalloc is likely to break the CUDA IPC API. He suggested that perhaps they could instead adapt the approach taken in the driver for MI300A to smaller APUs. That is, replacing hipMalloc with kernel-allocated system memory buffer objects.

I'm not sure whether it was as a result of that conversation, or if the KFD developers were working on this anyway, but it seems that a patch implementing this approach landed for Linux 6.10 RC1 [2][3]. There is at least one user report of successfully running Stable Diffusion without the force-host-allocation hack [4].

It's nice to see that consumer APUs are benefiting from the work done for MI300A.

Sincerely,
Cory Bloor

[1]: https://github.com/segurac/force-host-alloction-APU

[2]: https://www.phoronix.com/news/Linux-6.10-AMDKFD-Small-APUs
[3]: https://gitlab.freedesktop.org/drm/kernel/-/commit/eb853413d02c8d9b27942429b261a9eef228f005
[4]: https://github.com/ROCm/ROCm/issues/2014#issuecomment-2131988809


Reply to: