[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Building llama.cpp for AMD GPU using only Debian packages?



On Tue, 28 Jan 2025 at 08:25, Petter Reinholdtsen <pere@hungry.com> wrote:
>
> Here is a small update on building llama.cpp with support for (at least
> my) AMD GPU.  The cmake arguments for the latest github edition (commit
> d6d24cd9ed6d0b9558643dcc28f2124bef488c52) have changed slightly since
> the first recipe in this thread, so here is the one I used to
> successfully build
>
>   HIPCXX=clang-17 cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Release \
>     -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1100  && \
>     make -C build -j32
>
> I picked the cmake arguments from docs/build.md, but the ones listed
> there had to be modified to use clang-17 instead of clang (ie v19), and
> I did not need to specify HIP_PATH.
>
> --
> Happy hacking
> Petter Reinholdtsen

I am not sure this is actually needed anymore.

In the past I found Vulkan on RDNA1 (5600XT) was significantly slower
than ROCm, like half the text generation speed.

I was recently advised to try Vulkan again. I found RDNA1 (5600XT)
Vulkan runs the same speed as ROCm, and RDNA2 (6600XT) Vulkan runs
faster than ROCm by about 10%!

AMD's marketing for RDNA3 (7900XTX) also uses Vulkan to spruik
performance gains over competing cards with CUDA. The XTX is even
officially supported in ROCm, so surely if ROCm was faster they'd use
that result instead.

It seems at least for RDNA1 and newer, we're better to use Vulkan with
llama.cpp now.

I wonder if you can reproduce these results? I tried with Q80 and Q6KL
models which all fit into VRAM.

I just tested quickly with the LM Studio Vulkan runtime. I have
previously compiled llama.cpp in an Ubuntu podman container with the
official Vulkan SDK, then ran that container on Debian 12. I am not
sure of steps to compile llama.cpp for Vulkan using only Debian
libraries.

Jamie


Reply to: