[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Building llama.cpp for AMD GPU using only Debian packages?



Hi Petter,

On 2024-07-01 07:12, Petter Reinholdtsen wrote:
Anyway, now I get a crash
in/usr/lib/x86_64-linux-gnu/librocblas.so.0.1.  These are the last lines
from a valgrind run:

==27243== Warning: set address range perms: large range [0x221c55000, 0x231e56000) (noaccess)
llama_kv_cache_init:      ROCm0 KV buffer size =   256,00 MiB
llama_new_context_with_model: KV self size  =  256,00 MiB, K (f16):  128,00 MiB, V (f16):  128,00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0,12 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   164,00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    12,01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2
vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x79 0x13 0xC0 0xC5 0xF0 0x57 0xC9 0xC5
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38
vex amd64->IR:   PFX.66=1 PFX.F2=0 PFX.F3=0
==27243== valgrind: Unrecognised instruction at address 0x1331a8a8.

I suspect this is because the rocblas library contains AVX instructions. There are a few places in the rocBLAS CMake where it passes -mf16c to enable F16C instructions. It seems that clang takes this as license to enable all AVX instructions, so the impact of that flag ends up being much broader than you would expect [1].

We could probably backport the upstream fix [2], although there might be a few other changes we need to backport along with it, as I vaguely recall trying to disable it in the past and getting build failures.

Sincerely,
Cory Bloor

[1]: https://github.com/ROCm/rocBLAS/issues/1422
[2]: https://github.com/ROCm/rocBLAS/commit/c6bc09073959a2881a701b88ae1ed9de469354f1


Reply to: