Re: Building llama.cpp for AMD GPU using only Debian packages?
[Petter Reinholdtsen]
> Thank you. I got it building. But I guess I miss some preparation
> step, as I get this when running llama-cli:
>
> rocBLAS error: Could not initialize Tensile host: No devices found
> Avbrutt (SIGABRT)
I managed to figure out this one, as a missing permission issue. Adding
my user to the 'render' group got me further. I wish there was a way to
grant GPU access without full access to the render device.
Anyway, now I get a crash
in/usr/lib/x86_64-linux-gnu/librocblas.so.0.1. These are the last lines
from a valgrind run:
==27243== Warning: set address range perms: large range [0x221c55000, 0x231e56000) (noaccess)
llama_kv_cache_init: ROCm0 KV buffer size = 256,00 MiB
llama_new_context_with_model: KV self size = 256,00 MiB, K (f16): 128,00 MiB, V (f16): 128,00 MiB
llama_new_context_with_model: ROCm_Host output buffer size = 0,12 MiB
llama_new_context_with_model: ROCm0 compute buffer size = 164,00 MiB
llama_new_context_with_model: ROCm_Host compute buffer size = 12,01 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 2
vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x79 0x13 0xC0 0xC5 0xF0 0x57 0xC9 0xC5
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38
vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0
==27243== valgrind: Unrecognised instruction at address 0x1331a8a8.
==27243== at 0x1331A8A8: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x13326E28: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x13157CBA: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x13155D51: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x1314DB31: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x1314B477: rocblas_gemm_batched_ex (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x1305CD09: hipblasGemmBatchedEx (in /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1)
==27243== by 0x4AA55CD: ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
==27243== by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
==27243== by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
==27243== by 0x48D1A32: llama_decode (in /home/pere/src/ki/llama.cpp/build/src/libllama.so)
==27243== by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in /home/pere/src/ki/llama.cpp/build/bin/llama-cli)
==27243== Your program just tried to execute an instruction that Valgrind
==27243== did not recognise. There are two possible reasons for this.
==27243== 1. Your program has a bug and erroneously jumped to a non-code
==27243== location. If you are running Memcheck and you just saw a
==27243== warning about a bad jump, it's probably your program's fault.
==27243== 2. The instruction is legitimate but Valgrind doesn't handle it,
==27243== i.e. it's Valgrind's fault. If you think this is the case or
==27243== you are not sure, please let us know and we'll try to fix it.
==27243== Either way, Valgrind will now raise a SIGILL signal which will
==27243== probably kill your program.
==27243==
==27243== Process terminating with default action of signal 4 (SIGILL)
==27243== Illegal opcode at address 0x1331A8A8
==27243== at 0x1331A8A8: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x13326E28: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x13157CBA: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x13155D51: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x1314DB31: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x1314B477: rocblas_gemm_batched_ex (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
==27243== by 0x1305CD09: hipblasGemmBatchedEx (in /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1)
==27243== by 0x4AA55CD: ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
==27243== by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
==27243== by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
==27243== by 0x48D1A32: llama_decode (in /home/pere/src/ki/llama.cpp/build/src/libllama.so)
==27243== by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in /home/pere/src/ki/llama.cpp/build/bin/llama-cli)
==27243==
==27243== HEAP SUMMARY:
==27243== in use at exit: 659,260,263 bytes in 3,380,913 blocks
==27243== total heap usage: 19,712,537 allocs, 16,331,624 frees, 5,271,145,975 bytes allocated
==27243==
==27243== LEAK SUMMARY:
==27243== definitely lost: 120 bytes in 3 blocks
==27243== indirectly lost: 2,422 bytes in 45 blocks
==27243== possibly lost: 18,964 bytes in 160 blocks
==27243== still reachable: 659,238,757 bytes in 3,380,705 blocks
==27243== of which reachable via heuristic:
==27243== multipleinheritance: 1,056 bytes in 12 blocks
==27243== suppressed: 0 bytes in 0 blocks
==27243== Rerun with --leak-check=full to see details of leaked memory
==27243==
==27243== For lists of detected and suppressed errors, rerun with: -s
==27243== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Ulovlig instruksjon (SIGILL)
I assume my old test machine CPU lack some required feature, and that
the library need a rebuild to avoid it, or I need to finish setting the
new test machine together to get a more recent CPU. :)
--
Happy hacking
Petter Reinholdtsen
Reply to: