Re: Building llama.cpp for AMD GPU using only Debian packages?

To: Jamie Bainbridge <jamie.bainbridge@gmail.com>, Petter Reinholdtsen <pere@hungry.com>
Cc: debian-ai@lists.debian.org
Subject: Re: Building llama.cpp for AMD GPU using only Debian packages?
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Fri, 31 Jan 2025 07:47:13 -0700
Message-id: <[🔎] 2a0bb4a5-bec1-4abe-b10a-3905afca8a09@slerp.xyz>
In-reply-to: <[🔎] CAAvyFNjufDunbeV42O6Or4ty7bxXerbwOF_0STbvXSVkqus95w@mail.gmail.com>
References: <sa6sewtv5km.fsf@hjemme.reinholdtsen.name> <a2e3c24b-6459-41cb-995f-93bb4bdfee97@slerp.xyz> <sa6wmm5s28m.fsf@hjemme.reinholdtsen.name> <sa6r0cds1br.fsf@hjemme.reinholdtsen.name> <af3fa2a8-63f0-44ec-bb0e-5105a3235f6b@slerp.xyz> <a6840953-4656-47ed-96f7-ee7fe2f5b247@slerp.xyz> <sa6tth8p2rk.fsf@hjemme.reinholdtsen.name> <sa6a5izpnvp.fsf@hjemme.reinholdtsen.name> <sa64j96q4gg.fsf@hjemme.reinholdtsen.name> <[🔎] sa6sep3dh5h.fsf@hjemme.reinholdtsen.name> <[🔎] CAAvyFNjufDunbeV42O6Or4ty7bxXerbwOF_0STbvXSVkqus95w@mail.gmail.com>

Hi Jamie,

On 2025-01-30 17:39, Jamie Bainbridge wrote:

I am not sure this is actually needed anymore.

In the past I found Vulkan on RDNA1 (5600XT) was significantly slower
than ROCm, like half the text generation speed.

I was recently advised to try Vulkan again. I found RDNA1 (5600XT)
Vulkan runs the same speed as ROCm, and RDNA2 (6600XT) Vulkan runs
faster than ROCm by about 10%!

The rocBLAS library depends heavily on tuned Tensile assembly kernels toachieve good performance. What you're seeing on RDNA1 is the performanceof the rocBLAS library using what is basically the reference implementation.

On RDNA2, there are assembly kernels, but AFAIK there was never anyrocBLAS tuning done for llama-cpp workloads. It's likely that theparameter space is not well-covered and Tensile is forced to select asuboptimal assembly kernel. It's likely that the performance could besignificantly improved through tuning.

The Tensile library has a tremendous amount of technical debt and it'snot exactly easy to use. I've never done Tensile tuning before, butthree of my close friends from grad school were Tensile developers for afew years [1], so I've asked for a favour. Benjamin Ulmer is going tohelp tune rocBLAS for llama.cpp on RDNA 1, Though whether AMD upstreamaccepts that tuning is an open question.


What models should we be tuning for?

On 2025-01-30 17:39, Jamie Bainbridge wrote:

AMD's marketing for RDNA3 (7900XTX) also uses Vulkan to spruik
performance gains over competing cards with CUDA. The XTX is even
officially supported in ROCm, so surely if ROCm was faster they'd use
that result instead.

I wouldn't read too much into that. That does suggest that the Vulkanimplementation was faster, but we don't know if that's a well-optimizedresult. I suspect it's not.


Sincerely,
Cory Bloor

[1]: I'd hoped that with so many friends on the team, I'd be able tohave some influence on the technical direction of the library.Unfortunately, that proved not to be the case. It was a bit of a lifelesson for me.

Reply to:

Follow-Ups:
- Re: Building llama.cpp for AMD GPU using only Debian packages?
  - From: Jamie Bainbridge <jamie.bainbridge@gmail.com>

References:
- Re: Building llama.cpp for AMD GPU using only Debian packages?
  - From: Petter Reinholdtsen <pere@hungry.com>
- Re: Building llama.cpp for AMD GPU using only Debian packages?
  - From: Jamie Bainbridge <jamie.bainbridge@gmail.com>

Prev by Date: Full open source datasets for testing and benchmarking?
Next by Date: Re: Bug#1094806: ITP: ollama -- large language model tools
Previous by thread: Re: Building llama.cpp for AMD GPU using only Debian packages?
Next by thread: Re: Building llama.cpp for AMD GPU using only Debian packages?
Index(es):
- Date
- Thread