[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Building llama.cpp for AMD GPU using only Debian packages?



Hi Jamie,

On 2025-01-31 23:13, Jamie Bainbridge wrote:
Though whether AMD upstream
accepts that tuning is an open question.
Open source software lives or dies by its community. With an
inattentive upstream as you describe above, it makes me wonder if
effort spent in ROCm is worth it.

I will make no excuses for rocblas. They have repeatedly ignored good patches that fix problems on unsupported GPUs (e.g., [1][2]). It's perhaps worth noting that in [2], it was not my decision to deny the merge. I was just the messenger. If it were up to me, the PR would have been merged within a few weeks of being submitted.

In my books, rocblas has one last chance before I support a community fork. ROCm is open source. It's not that hard to replace any of AMD's libraries. It would be trivial to package a rocblas fork for Debian.

ggeranov has also said llama.cpp is just maintaining ROCm support and
future effort will be put into Vulkan inference.

If Vulkan prompt processing could be sped up (maybe like jart's matmul
work? https://justine.lol/matmul/) then there would be little need to
use ROCm for llama.cpp at all.

The performance of the Vulkan implementation is the same as untuned rocBLAS, which to me suggests to me that the performance is probably quite poor.

The problem is that parameterizations required for best performance will be different for every single GPU architecture. You can almost think of the rocBLAS library as nothing more than a big database of what the best parameterizations are for GEMM operations on particular piece of hardware. The rocBLAS repo is 250k LOC of C++ code and 5 million lines of YAML describing which parameterization is best for a given GEMM.

To achieve the maximal performance using Vulkan, llama.cpp would have to turn itself into a GEMM parameter database too. There's nothing stopping llama.cpp from doing that, however, the annoying thing is that every application would have to do the same. It makes far more sense to have the GEMM database reside within the BLAS library, where it can be shared between multiple programs.

Sincerely,
Cory Bloor

[1]: https://github.com/ROCm/rocBLAS/pull/1455
[2]: https://github.com/ROCm/rocBLAS/pull/1251


Reply to: