Bug#1075724: marked as done (rocblas: Give SIGILL on CPUS without the f16c extention)

To: Cordell Bloor <cgmb@slerp.xyz>
Subject: Bug#1075724: marked as done (rocblas: Give SIGILL on CPUS without the f16c extention)
From: "Debian Bug Tracking System" <owner@bugs.debian.org>
Date: Thu, 04 Jul 2024 05:36:15 +0000
Message-id: <[🔎] handler.1075724.D1075724.17200713383579777.ackdone@bugs.debian.org>
Reply-to: 1075724@bugs.debian.org
References: <E1sPF83-000J2p-5F@fasolo.debian.org> <[🔎] sa6v81mnyhz.fsf@hjemme.reinholdtsen.name>

Your message dated Thu, 04 Jul 2024 05:35:35 +0000
with message-id <E1sPF83-000J2p-5F@fasolo.debian.org>
and subject line Bug#1075724: fixed in rocblas 5.5.1+dfsg-6
has caused the Debian Bug report #1075724,
regarding rocblas: Give SIGILL on CPUS without the f16c extention
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1075724: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1075724
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: submit@bugs.debian.org
Subject: rocblas: Give SIGILL on CPUS without the f16c extention
From: Petter Reinholdtsen <pere@hungry.com>
Date: Wed, 03 Jul 2024 20:04:08 +0200
Message-id: <[🔎] sa6v81mnyhz.fsf@hjemme.reinholdtsen.name>

Package: rocblas
Version: 5.5.1+dfsg-5
Tags: patch

When compiling llama.cpp with ROCm support and running it, I get a
illegal instruction crash in the binary.  The cause seem to be that
rocblas is built with -mf16c.

I built llama.cpp using this command line:

  HIPCXX=clang-17 cmake -H. -Bbuild -DGGML_HIPBLAS=ON -DCMAKE_HIP_ARCHITECTURES="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102" -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=ON

I see the crash after downloading a model from huggingface and starting
bin/llama-cli using this model.  Using valgrind, I get this report from
the crash:

   ==27243== Warning: set address range perms: large range [0x221c55000,
   0x231e56000) (noaccess)
   llama_kv_cache_init:      ROCm0 KV buffer size =   256,00 MiB
   llama_new_context_with_model: KV self size  =  256,00 MiB, K (f16):  128,00
   MiB, V (f16):  128,00 MiB
   llama_new_context_with_model:  ROCm_Host  output buffer size =     0,12 MiB
   llama_new_context_with_model:      ROCm0 compute buffer size =   164,00 MiB
   llama_new_context_with_model:  ROCm_Host compute buffer size =    12,01 MiB
   llama_new_context_with_model: graph nodes  = 1030
   llama_new_context_with_model: graph splits = 2
   vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x79 0x13 0xC0 0xC5
   0xF0 0x57 0xC9 0xC5
   vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
   vex amd64->IR:   VEX=1 VEX.L=0 VEX.nVVVV=0x0 ESC=0F38
   vex amd64->IR:   PFX.66=1 PFX.F2=0 PFX.F3=0
   ==27243== valgrind: Unrecognised instruction at address 0x1331a8a8.
   ==27243==    at 0x1331A8A8: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13326E28: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13157CBA: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13155D51: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1314DB31: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1314B477: rocblas_gemm_batched_ex (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1305CD09: hipblasGemmBatchedEx (in
   /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1)
   ==27243==    by 0x4AA55CD:
   ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor
   const*, ggml_tensor const*, ggml_tensor*) (in
   /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*,
   ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in
   /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x48D1A32: llama_decode (in
   /home/pere/src/ki/llama.cpp/build/src/libllama.so)
   ==27243==    by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in
   /home/pere/src/ki/llama.cpp/build/bin/llama-cli)
   ==27243== Your program just tried to execute an instruction that Valgrind
   ==27243== did not recognise.  There are two possible reasons for this.
   ==27243== 1. Your program has a bug and erroneously jumped to a non-code
   ==27243==    location.  If you are running Memcheck and you just saw a
   ==27243==    warning about a bad jump, it's probably your program's fault.
   ==27243== 2. The instruction is legitimate but Valgrind doesn't handle it,
   ==27243==    i.e. it's Valgrind's fault.  If you think this is the case or
   ==27243==    you are not sure, please let us know and we'll try to fix it.
   ==27243== Either way, Valgrind will now raise a SIGILL signal which will
   ==27243== probably kill your program.
   ==27243== 
   ==27243== Process terminating with default action of signal 4 (SIGILL)
   ==27243==  Illegal opcode at address 0x1331A8A8
   ==27243==    at 0x1331A8A8: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13326E28: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13157CBA: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x13155D51: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1314DB31: ??? (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1314B477: rocblas_gemm_batched_ex (in
   /usr/lib/x86_64-linux-gnu/librocblas.so.0.1)
   ==27243==    by 0x1305CD09: hipblasGemmBatchedEx (in
   /usr/lib/x86_64-linux-gnu/libhipblas.so.0.1)
   ==27243==    by 0x4AA55CD:
   ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor
   const*, ggml_tensor const*, ggml_tensor*) (in
   /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x4A94C71: ggml_backend_cuda_graph_compute(ggml_backend*,
   ggml_cgraph*) (in /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x4A1E61C: ggml_backend_sched_graph_compute_async (in
   /home/pere/src/ki/llama.cpp/build/ggml/src/libggml.so)
   ==27243==    by 0x48D1A32: llama_decode (in
   /home/pere/src/ki/llama.cpp/build/src/libllama.so)
   ==27243==    by 0x13C4EC: llama_init_from_gpt_params(gpt_params&) (in
   /home/pere/src/ki/llama.cpp/build/bin/llama-cli)
   ==27243== 
   ==27243== HEAP SUMMARY:
   ==27243==     in use at exit: 659,260,263 bytes in 3,380,913 blocks
   ==27243==   total heap usage: 19,712,537 allocs, 16,331,624 frees,
   5,271,145,975 bytes allocated
   ==27243== 
   ==27243== LEAK SUMMARY:
   ==27243==    definitely lost: 120 bytes in 3 blocks
   ==27243==    indirectly lost: 2,422 bytes in 45 blocks
   ==27243==      possibly lost: 18,964 bytes in 160 blocks
   ==27243==    still reachable: 659,238,757 bytes in 3,380,705 blocks
   ==27243==                       of which reachable via heuristic:
   ==27243==                         multipleinheritance: 1,056 bytes in 12
   blocks
   ==27243==         suppressed: 0 bytes in 0 blocks
   ==27243== Rerun with --leak-check=full to see details of leaked memory
   ==27243== 
   ==27243== For lists of detected and suppressed errors, rerun with: -s
   ==27243== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
   Ulovlig instruksjon (SIGILL)

Accoring to Cory Bloor, The disassembly of those bytes show that it is
the vcvtph2ps instruction causing the crash:

  0:  c4 e2 79 13 c0          vcvtph2ps xmm0,xmm0
  5:  c5 f0 57 c9             vxorps xmm1,xmm1,xmm1
  9:  c5                      .byte 0xc5

I managed to avoid the crash and get llama.cpp working by applying the
following patch and rebuilding rocblas:

--- rocblas-5.5.1+dfsg.orig/library/src/CMakeLists.txt
+++ rocblas-5.5.1+dfsg/library/src/CMakeLists.txt
@@ -411,7 +411,7 @@ endif()
 #  -fno-gpu-rdc compiler option was used with hcc, so revisit feature at some point
 
 # GCC or hip-clang needs specific flags to turn on f16c intrinsics
-target_compile_options( rocblas PRIVATE -mf16c )
+#target_compile_options( rocblas PRIVATE -mf16c )
 
 # Do not allow Variable Length Arrays (use unique_ptr instead)
 target_compile_options( rocblas PRIVATE -Werror=vla )

Please consider including it in an upload to Debian.

According to https://github.com/ROCm/rocBLAS/issues/1422 and
<URL: https://github.com/ROCm/rocBLAS/commit/c6bc09073959a2881a701b88ae1ed9de469354f1 >,
the issue might already be fixed upstream, but I have not tested that
version.

See also <URL: https://lists.debian.org/debian-ai/2024/07/msg00007.html >.

-- 
Happy hacking
Petter Reinholdtsen

--- End Message ---

--- Begin Message ---

To: 1075724-close@bugs.debian.org
Subject: Bug#1075724: fixed in rocblas 5.5.1+dfsg-6
From: Debian FTP Masters <ftpmaster@ftp-master.debian.org>
Date: Thu, 04 Jul 2024 05:35:35 +0000
Message-id: <E1sPF83-000J2p-5F@fasolo.debian.org>
Reply-to: Cordell Bloor <cgmb@slerp.xyz>

Source: rocblas
Source-Version: 5.5.1+dfsg-6
Done: Cordell Bloor <cgmb@slerp.xyz>

We believe that the bug you reported is fixed in the latest version of
rocblas, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 1075724@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Cordell Bloor <cgmb@slerp.xyz> (supplier of updated rocblas package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Format: 1.8
Date: Wed, 03 Jul 2024 15:23:54 -0600
Source: rocblas
Architecture: source
Version: 5.5.1+dfsg-6
Distribution: unstable
Urgency: medium
Maintainer: Debian ROCm Team <debian-ai@lists.debian.org>
Changed-By: Cordell Bloor <cgmb@slerp.xyz>
Closes: 1075724
Changes:
 rocblas (5.5.1+dfsg-6) unstable; urgency=medium
 .
   * Add d/p/0023-remove-mf16c-flag.patch to fix compatibility with non-AVX
     processors. (Closes: #1075724)
   * d/rules: simplify target architectures to match other ROCm libraries,
     dropping xnack- specialization on gfx906, gfx908 and xnack-/xnack+
     specialization on gfx90a (except in tensile kernels).
   * Add d/p/0024-use-xnack-specialized-assembly-kernels-with-gfx90a.patch
     to retain xnack-/xnack+ specialization in tensile kernels.
   * d/gbp.conf: ensure use of pristine-tar
Checksums-Sha1:
 a5b662f5bfbcdd9b5b63d7b666358336b2c95a94 3272 rocblas_5.5.1+dfsg-6.dsc
 dc0a15fe8fa1764103af0c60caddb10275912618 371716 rocblas_5.5.1+dfsg-6.debian.tar.xz
 1bceb31af6ccac91ab041082ccd6fe2a71669bcd 8739 rocblas_5.5.1+dfsg-6_source.buildinfo
Checksums-Sha256:
 b8e0ebff208f8273ecde2c8658e70be5c01449b758b21e8535921907059ac52e 3272 rocblas_5.5.1+dfsg-6.dsc
 777054071fa3d61e6464a33a1d3aeebd6b1178c2c3138a94b7c98d16d3b249b1 371716 rocblas_5.5.1+dfsg-6.debian.tar.xz
 92c8f65ee80bc19d8cbe9984c4e5db25ef543cfe2a6e431ee43b50865c5eb0aa 8739 rocblas_5.5.1+dfsg-6_source.buildinfo
Files:
 a1d58a23889abf11e7853af2a8383092 3272 devel optional rocblas_5.5.1+dfsg-6.dsc
 9dab9e4a46feae88be194828888ebb49 371716 devel optional rocblas_5.5.1+dfsg-6.debian.tar.xz
 9f94da6a7412375c9b13f22a4c4af318 8739 devel optional rocblas_5.5.1+dfsg-6_source.buildinfo

-----BEGIN PGP SIGNATURE-----

iQJDBAEBCgAtFiEEDOBv45KqSPjSLkRx+e+O4MAhYw8FAmaFxOoPHGNnbWJAc2xl
cnAueHl6AAoJEPnvjuDAIWMPlnkP/1fJdYzJJjZWhoanyWaWoV6gCqwVUClmEURc
Xv3DdaWH748eWmig/9QIo06JTksFNQTn0faYhKxDA9ouk2x6md+N8YkPujw+1CdM
9tIywAeAAZPHkizxcTDg3Tf6w5GH2mU0/0tNXOQiKAJIs6TXPLNZh25e01i9hwM2
iP0uM7T2keFd33D1PUK1Mchh9+ZYxCxlVswY0Fafxs3ciktrqrWDW9ynv4FjBIJd
+KiJ00EMB2FkvXQCsAh7PJHhIujxNoh+BUqmgrqPP8dmrCM2I1VDqwyzzP9ad9BV
IUvPIm3EAFmPpSzg7qAxC/TT1d/faHx4ydvpW0sJafkxFDnxUGtd9bqchemm1lj9
bWE2v3OeAClnicE/qkV9TEsfoFywqODe87SbiP0jdi94y3MrMFm5JnyqEwQiiqYu
Z3O2K1rVsrHBysNAQjSeIXU2zynQyXaYocABzpEZuyWCcQvg5XoZwO/SAw5Vt4So
8zOSmTDD31AUrEoS88pmebyevBnQSPoH8thIndZoWYPoNIVUnyFWvIz0eX7YEt0V
nbNswBrK8yp6IY4ZdD4vHOwlKn8qltvdzlcQmdXL77G5uhyGkN/05nk2i1kmPCR4
i699v99CENKviZiuH3I0D5s0pPX+zPscVSaLpEaAwmlOtcRv77+haOzkwbsqXFz5
fVlewHq1
=Po1M
-----END PGP SIGNATURE-----

Attachment: pgp35sSLMLQhM.pgp
Description: PGP signature

--- End Message ---

Reply to:

References:
- Bug#1075724: rocblas: Give SIGILL on CPUS without the f16c extention
  - From: Petter Reinholdtsen <pere@hungry.com>

Prev by Date: rocblas_5.5.1+dfsg-6_source.changes ACCEPTED into unstable
Next by Date: gpuenv-utils 0.1.4 uploaded: Powersaving / uptime management
Previous by thread: Processed: Re: Bug#1075724: rocblas: Give SIGILL on CPUS without the f16c extention
Next by thread: hipfft_6.1.2-3_source.changes ACCEPTED into unstable
Index(es):
- Date
- Thread