Your message dated Thu, 04 Jul 2024 05:35:35 +0000 with message-id <E1sPF83-000J2p-5F@fasolo.debian.org> and subject line Bug#1075724: fixed in rocblas 5.5.1+dfsg-6 has caused the Debian Bug report #1075724, regarding rocblas: Give SIGILL on CPUS without the f16c extention to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@bugs.debian.org immediately.) -- 1075724: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1075724 Debian Bug Tracking System Contact owner@bugs.debian.org with problems
--- Begin Message ---
- To: submit@bugs.debian.org
- Subject: rocblas: Give SIGILL on CPUS without the f16c extention
- From: Petter Reinholdtsen <pere@hungry.com>
- Date: Wed, 03 Jul 2024 20:04:08 +0200
- Message-id: <[🔎] sa6v81mnyhz.fsf@hjemme.reinholdtsen.name>
Package: rocblas Version: 5.5.1+dfsg-5 Tags: patch When compiling llama.cpp with ROCm support and running it, I get a illegal instruction crash in the binary. The cause seem to be that rocblas is built with -mf16c. I built llama.cpp using this command line: HIPCXX=clang-17 cmake -H. -Bbuild -DGGML_HIPBLAS=ON -DCMAKE_HIP_ARCHITECTURES="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102" -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=ON I see the crash after downloading a model from huggingface and starting bin/llama-cli using this model. Using valgrind, I get this report from the crash: ==27243== Warning: set address range perms: large range [0x221c55000, 0x231e56000) (noaccess) llama_kv_cache_init: ROCm0 KV buffer size = 256,00 MiB llama_new_context_with_model: KV self size = 256,00 MiB, K (f16): 128,00 MiB, V (f16): 128,00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0,12 MiB llama_new_context_with_model: ROCm0 compute buffer size = 164,00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 12,01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 2 vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x79 0x13 0xC0 0xC5 0xF0 0x57 0xC9 0xC5 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38 vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==27243== valgrind: Unrecognised instruction at address 0x1331a8a8. ==27243== at 0x1331A8A8: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13326E28: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13157CBA: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13155D51: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1314DB31: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1314B477: rocblas_gemm_batched_ex (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1305CD09: hipblasGemmBatchedEx (in /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1) ==27243== by 0x4AA55CD: ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x48D1A32: llama_decode (in /home/pere/src/ki/llama.cpp/build/src/libllama.so) ==27243== by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in /home/pere/src/ki/llama.cpp/build/bin/llama-cli) ==27243== Your program just tried to execute an instruction that Valgrind ==27243== did not recognise. There are two possible reasons for this. ==27243== 1. Your program has a bug and erroneously jumped to a non-code ==27243== location. If you are running Memcheck and you just saw a ==27243== warning about a bad jump, it's probably your program's fault. ==27243== 2. The instruction is legitimate but Valgrind doesn't handle it, ==27243== i.e. it's Valgrind's fault. If you think this is the case or ==27243== you are not sure, please let us know and we'll try to fix it. ==27243== Either way, Valgrind will now raise a SIGILL signal which will ==27243== probably kill your program. ==27243== ==27243== Process terminating with default action of signal 4 (SIGILL) ==27243== Illegal opcode at address 0x1331A8A8 ==27243== at 0x1331A8A8: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13326E28: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13157CBA: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x13155D51: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1314DB31: ??? (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1314B477: rocblas_gemm_batched_ex (in /usr/lib/x86_64-linux-gnu/librocblas.so.0.1) ==27243== by 0x1305CD09: hipblasGemmBatchedEx (in /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1) ==27243== by 0x4AA55CD: ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so) ==27243== by 0x48D1A32: llama_decode (in /home/pere/src/ki/llama.cpp/build/src/libllama.so) ==27243== by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in /home/pere/src/ki/llama.cpp/build/bin/llama-cli) ==27243== ==27243== HEAP SUMMARY: ==27243== in use at exit: 659,260,263 bytes in 3,380,913 blocks ==27243== total heap usage: 19,712,537 allocs, 16,331,624 frees, 5,271,145,975 bytes allocated ==27243== ==27243== LEAK SUMMARY: ==27243== definitely lost: 120 bytes in 3 blocks ==27243== indirectly lost: 2,422 bytes in 45 blocks ==27243== possibly lost: 18,964 bytes in 160 blocks ==27243== still reachable: 659,238,757 bytes in 3,380,705 blocks ==27243== of which reachable via heuristic: ==27243== multipleinheritance: 1,056 bytes in 12 blocks ==27243== suppressed: 0 bytes in 0 blocks ==27243== Rerun with --leak-check=full to see details of leaked memory ==27243== ==27243== For lists of detected and suppressed errors, rerun with: -s ==27243== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Ulovlig instruksjon (SIGILL) Accoring to Cory Bloor, The disassembly of those bytes show that it is the vcvtph2ps instruction causing the crash: 0: c4 e2 79 13 c0 vcvtph2ps xmm0,xmm0 5: c5 f0 57 c9 vxorps xmm1,xmm1,xmm1 9: c5 .byte 0xc5 I managed to avoid the crash and get llama.cpp working by applying the following patch and rebuilding rocblas: --- rocblas-5.5.1+dfsg.orig/library/src/CMakeLists.txt +++ rocblas-5.5.1+dfsg/library/src/CMakeLists.txt @@ -411,7 +411,7 @@ endif() # -fno-gpu-rdc compiler option was used with hcc, so revisit feature at some point # GCC or hip-clang needs specific flags to turn on f16c intrinsics -target_compile_options( rocblas PRIVATE -mf16c ) +#target_compile_options( rocblas PRIVATE -mf16c ) # Do not allow Variable Length Arrays (use unique_ptr instead) target_compile_options( rocblas PRIVATE -Werror=vla ) Please consider including it in an upload to Debian. According to https://github.com/ROCm/rocBLAS/issues/1422 and <URL: https://github.com/ROCm/rocBLAS/commit/c6bc09073959a2881a701b88ae1ed9de469354f1 >, the issue might already be fixed upstream, but I have not tested that version. See also <URL: https://lists.debian.org/debian-ai/2024/07/msg00007.html >. -- Happy hacking Petter Reinholdtsen
--- End Message ---
--- Begin Message ---
- To: 1075724-close@bugs.debian.org
- Subject: Bug#1075724: fixed in rocblas 5.5.1+dfsg-6
- From: Debian FTP Masters <ftpmaster@ftp-master.debian.org>
- Date: Thu, 04 Jul 2024 05:35:35 +0000
- Message-id: <E1sPF83-000J2p-5F@fasolo.debian.org>
- Reply-to: Cordell Bloor <cgmb@slerp.xyz>
Source: rocblas Source-Version: 5.5.1+dfsg-6 Done: Cordell Bloor <cgmb@slerp.xyz> We believe that the bug you reported is fixed in the latest version of rocblas, which is due to be installed in the Debian FTP archive. A summary of the changes between this version and the previous one is attached. Thank you for reporting the bug, which will now be closed. If you have further comments please address them to 1075724@bugs.debian.org, and the maintainer will reopen the bug report if appropriate. Debian distribution maintenance software pp. Cordell Bloor <cgmb@slerp.xyz> (supplier of updated rocblas package) (This message was generated automatically at their request; if you believe that there is a problem with it please contact the archive administrators by mailing ftpmaster@ftp-master.debian.org) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Format: 1.8 Date: Wed, 03 Jul 2024 15:23:54 -0600 Source: rocblas Architecture: source Version: 5.5.1+dfsg-6 Distribution: unstable Urgency: medium Maintainer: Debian ROCm Team <debian-ai@lists.debian.org> Changed-By: Cordell Bloor <cgmb@slerp.xyz> Closes: 1075724 Changes: rocblas (5.5.1+dfsg-6) unstable; urgency=medium . * Add d/p/0023-remove-mf16c-flag.patch to fix compatibility with non-AVX processors. (Closes: #1075724) * d/rules: simplify target architectures to match other ROCm libraries, dropping xnack- specialization on gfx906, gfx908 and xnack-/xnack+ specialization on gfx90a (except in tensile kernels). * Add d/p/0024-use-xnack-specialized-assembly-kernels-with-gfx90a.patch to retain xnack-/xnack+ specialization in tensile kernels. * d/gbp.conf: ensure use of pristine-tar Checksums-Sha1: a5b662f5bfbcdd9b5b63d7b666358336b2c95a94 3272 rocblas_5.5.1+dfsg-6.dsc dc0a15fe8fa1764103af0c60caddb10275912618 371716 rocblas_5.5.1+dfsg-6.debian.tar.xz 1bceb31af6ccac91ab041082ccd6fe2a71669bcd 8739 rocblas_5.5.1+dfsg-6_source.buildinfo Checksums-Sha256: b8e0ebff208f8273ecde2c8658e70be5c01449b758b21e8535921907059ac52e 3272 rocblas_5.5.1+dfsg-6.dsc 777054071fa3d61e6464a33a1d3aeebd6b1178c2c3138a94b7c98d16d3b249b1 371716 rocblas_5.5.1+dfsg-6.debian.tar.xz 92c8f65ee80bc19d8cbe9984c4e5db25ef543cfe2a6e431ee43b50865c5eb0aa 8739 rocblas_5.5.1+dfsg-6_source.buildinfo Files: a1d58a23889abf11e7853af2a8383092 3272 devel optional rocblas_5.5.1+dfsg-6.dsc 9dab9e4a46feae88be194828888ebb49 371716 devel optional rocblas_5.5.1+dfsg-6.debian.tar.xz 9f94da6a7412375c9b13f22a4c4af318 8739 devel optional rocblas_5.5.1+dfsg-6_source.buildinfo -----BEGIN PGP SIGNATURE----- iQJDBAEBCgAtFiEEDOBv45KqSPjSLkRx+e+O4MAhYw8FAmaFxOoPHGNnbWJAc2xl cnAueHl6AAoJEPnvjuDAIWMPlnkP/1fJdYzJJjZWhoanyWaWoV6gCqwVUClmEURc Xv3DdaWH748eWmig/9QIo06JTksFNQTn0faYhKxDA9ouk2x6md+N8YkPujw+1CdM 9tIywAeAAZPHkizxcTDg3Tf6w5GH2mU0/0tNXOQiKAJIs6TXPLNZh25e01i9hwM2 iP0uM7T2keFd33D1PUK1Mchh9+ZYxCxlVswY0Fafxs3ciktrqrWDW9ynv4FjBIJd +KiJ00EMB2FkvXQCsAh7PJHhIujxNoh+BUqmgrqPP8dmrCM2I1VDqwyzzP9ad9BV IUvPIm3EAFmPpSzg7qAxC/TT1d/faHx4ydvpW0sJafkxFDnxUGtd9bqchemm1lj9 bWE2v3OeAClnicE/qkV9TEsfoFywqODe87SbiP0jdi94y3MrMFm5JnyqEwQiiqYu Z3O2K1rVsrHBysNAQjSeIXU2zynQyXaYocABzpEZuyWCcQvg5XoZwO/SAw5Vt4So 8zOSmTDD31AUrEoS88pmebyevBnQSPoH8thIndZoWYPoNIVUnyFWvIz0eX7YEt0V nbNswBrK8yp6IY4ZdD4vHOwlKn8qltvdzlcQmdXL77G5uhyGkN/05nk2i1kmPCR4 i699v99CENKviZiuH3I0D5s0pPX+zPscVSaLpEaAwmlOtcRv77+haOzkwbsqXFz5 fVlewHq1 =Po1M -----END PGP SIGNATURE-----Attachment: pgp35sSLMLQhM.pgp
Description: PGP signature
--- End Message ---