Hi Jamie,
Though whether AMD upstream accepts that tuning is an open question.Open source software lives or dies by its community. With an inattentive upstream as you describe above, it makes me wonder if effort spent in ROCm is worth it.
I will make no excuses for rocblas. They have repeatedly ignored
good patches that fix problems on unsupported GPUs (e.g., [1][2]).
It's perhaps worth noting that in [2], it was not my decision to
deny the merge. I was just the messenger. If it were up to me, the
PR would have been merged within a few weeks of being submitted.
In my books, rocblas has one last chance before I support a community fork. ROCm is open source. It's not that hard to replace any of AMD's libraries. It would be trivial to package a rocblas fork for Debian.
ggeranov has also said llama.cpp is just maintaining ROCm support and future effort will be put into Vulkan inference. If Vulkan prompt processing could be sped up (maybe like jart's matmul work? https://justine.lol/matmul/) then there would be little need to use ROCm for llama.cpp at all.
The performance of the Vulkan implementation is the same as
untuned rocBLAS, which to me suggests to me that the performance
is probably quite poor.
The problem is that parameterizations required for best performance will be different for every single GPU architecture. You can almost think of the rocBLAS library as nothing more than a big database of what the best parameterizations are for GEMM operations on particular piece of hardware. The rocBLAS repo is 250k LOC of C++ code and 5 million lines of YAML describing which parameterization is best for a given GEMM.
To achieve the maximal performance using Vulkan, llama.cpp would
have to turn itself into a GEMM parameter database too. There's
nothing stopping llama.cpp from doing that, however, the annoying
thing is that every application would have to do the same. It
makes far more sense to have the GEMM database reside within the
BLAS library, where it can be shared between multiple programs.
Sincerely,
Cory Bloor
[1]: https://github.com/ROCm/rocBLAS/pull/1455
[2]: https://github.com/ROCm/rocBLAS/pull/1251