Hi Petter,
Anyway, now I get a crash in/usr/lib/x86_64-linux-gnu/librocblas.so.0.1. These are the last lines from a valgrind run: ==27243== Warning: set address range perms: large range [0x221c55000, 0x231e56000) (noaccess) llama_kv_cache_init: ROCm0 KV buffer size = 256,00 MiB llama_new_context_with_model: KV self size = 256,00 MiB, K (f16): 128,00 MiB, V (f16): 128,00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0,12 MiB llama_new_context_with_model: ROCm0 compute buffer size = 164,00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 12,01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 2 vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x79 0x13 0xC0 0xC5 0xF0 0x57 0xC9 0xC5 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38 vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==27243== valgrind: Unrecognised instruction at address 0x1331a8a8.
I suspect this is because the rocblas library contains AVX instructions. There are a few places in the rocBLAS CMake where it passes -mf16c to enable F16C instructions. It seems that clang takes this as license to enable all AVX instructions, so the impact of that flag ends up being much broader than you would expect [1].
We could probably backport the upstream fix [2], although there might be a few other changes we need to backport along with it, as I vaguely recall trying to disable it in the past and getting build failures.
Sincerely,
Cory Bloor
[1]: https://github.com/ROCm/rocBLAS/issues/1422
[2]:
https://github.com/ROCm/rocBLAS/commit/c6bc09073959a2881a701b88ae1ed9de469354f1