Re: ROCm on APUs
On 2024-04-02 16:17, Cordell Bloor wrote:
The ROCm packages for Debian are built such that they can run on AMD
APUs, however, there is a major limitation. Integrated GPUs are often
configured with a relatively small amount of initial memory dedicated
to the GPU. The APU expects that the memory reserved for the GPU will
be adjusted dynamically. Unfortunately, HIP applications will not
automatically request more memory to be assigned to the GPU and are
therefore stuck with the default allocation.
Carlos Segura has an interesting workaround [1]. Using LD_PRELOAD,
they replace hipMalloc / hipFree with hipHostMalloc / hipHostFree to
force all device memory allocations to use pinned host memory instead.
After raising the topic here, I briefly discussed this workaround with a
number of folks, including Felix Kuehling. He pointed out that replacing
hipMalloc with hipHostMalloc is likely to break the CUDA IPC API. He
suggested that perhaps they could instead adapt the approach taken in
the driver for MI300A to smaller APUs. That is, replacing hipMalloc with
kernel-allocated system memory buffer objects.
I'm not sure whether it was as a result of that conversation, or if the
KFD developers were working on this anyway, but it seems that a patch
implementing this approach landed for Linux 6.10 RC1 [2][3]. There is at
least one user report of successfully running Stable Diffusion without
the force-host-allocation hack [4].
It's nice to see that consumer APUs are benefiting from the work done
for MI300A.
Sincerely,
Cory Bloor
[1]: https://github.com/segurac/force-host-alloction-APU
[2]: https://www.phoronix.com/news/Linux-6.10-AMDKFD-Small-APUs
[3]:
https://gitlab.freedesktop.org/drm/kernel/-/commit/eb853413d02c8d9b27942429b261a9eef228f005
[4]: https://github.com/ROCm/ROCm/issues/2014#issuecomment-2131988809
Reply to: