[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Building llama.cpp for AMD GPU using only Debian packages?



Hi Jamie,

On 2025-01-30 17:39, Jamie Bainbridge wrote:
I am not sure this is actually needed anymore.

In the past I found Vulkan on RDNA1 (5600XT) was significantly slower
than ROCm, like half the text generation speed.

I was recently advised to try Vulkan again. I found RDNA1 (5600XT)
Vulkan runs the same speed as ROCm, and RDNA2 (6600XT) Vulkan runs
faster than ROCm by about 10%!

The rocBLAS library depends heavily on tuned Tensile assembly kernels to achieve good performance. What you're seeing on RDNA1 is the performance of the rocBLAS library using what is basically the reference implementation.

On RDNA2, there are assembly kernels, but AFAIK there was never any rocBLAS tuning done for llama-cpp workloads. It's likely that the parameter space is not well-covered and Tensile is forced to select a suboptimal assembly kernel. It's likely that the performance could be significantly improved through tuning.

The Tensile library has a tremendous amount of technical debt and it's not exactly easy to use. I've never done Tensile tuning before, but three of my close friends from grad school were Tensile developers for a few years [1], so I've asked for a favour. Benjamin Ulmer is going to help tune rocBLAS for llama.cpp on RDNA 1, Though whether AMD upstream accepts that tuning is an open question.

What models should we be tuning for?

On 2025-01-30 17:39, Jamie Bainbridge wrote:
AMD's marketing for RDNA3 (7900XTX) also uses Vulkan to spruik
performance gains over competing cards with CUDA. The XTX is even
officially supported in ROCm, so surely if ROCm was faster they'd use
that result instead.

I wouldn't read too much into that. That does suggest that the Vulkan implementation was faster, but we don't know if that's a well-optimized result. I suspect it's not.

Sincerely,
Cory Bloor

[1]: I'd hoped that with so many friends on the team, I'd be able to have some influence on the technical direction of the library. Unfortunately, that proved not to be the case. It was a bit of a life lesson for me.


Reply to: