Bug#1033724: clang-16 fails to detect amdgpu arch

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#1033724: clang-16 fails to detect amdgpu arch
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Thu, 30 Mar 2023 22:15:27 -0600
Message-id: <[🔎] 168023612778.6995.12576097125839763417.reportbug@0d73266fcfc8>
Reply-to: Cordell Bloor <cgmb@slerp.xyz>, 1033724@bugs.debian.org

Package: clang-16
Version: 1:16.0.0-1~exp5
Severity: normal
X-Debbugs-Cc: cgmb@slerp.xyz, debian-ai@lists.debian.org

Dear Maintainer,

Thank you for packaging amdgpu-arch as part of clang-tools-16! I see
that it can correctly detect the architecture of my installed GPU when
libhsa-runtime-dev is installed. Should libhsa-runtime-dev be
added to the list of Suggested packages?

In any case, I would expect that clang++-16 would be using amdgpu-arch
to autodetect the default GPU build target for hip programs, but that
doesn't seem to be happening. I have an RX 5700 XT installed on my
workstation and when I run amdgpu-arch-16, I see:

    $ amdgpu-arch-16 2> /dev/null
    gfx1010

However, when I build a sample program, it doesn't build for the right
architecture unless I manually specify. For example,

main.hip:

    #include <stdio.h>
    #include <stdlib.h>
    #include <hip/hip_runtime.h>
    
    #define CHECK_HIP(expr) do {              \
      hipError_t result = (expr);             \
      if (result != hipSuccess) {             \
        fprintf(stderr, "%s:%d: %s (%d)\n",   \
          __FILE__, __LINE__,                 \
          hipGetErrorString(result), result); \
        exit(EXIT_FAILURE);                   \
      }                                       \
    } while(0)
    
    __global__ void sq_arr(float *arr, int n) {
      int tid = blockDim.x*blockIdx.x + threadIdx.x;
      if (tid < n) {
        arr[tid] = arr[tid] * arr[tid];
      }
    }
    
    int main() {
      enum { N = 5 };
      float hArr[N] = { 1, 2, 3, 4, 5 };
      float *dArr;
      CHECK_HIP(hipMalloc(&dArr, sizeof(float) * N));
      CHECK_HIP(hipMemcpy(dArr, hArr, sizeof(float) * N, hipMemcpyHostToDevice));
      sq_arr<<<dim3(1), dim3(32,1,1), 0, 0>>>(dArr, N);
      CHECK_HIP(hipMemcpy(hArr, dArr, sizeof(float) * N, hipMemcpyDeviceToHost));
      for (int i = 0; i < N; ++i) {
        printf("%f\n", hArr[i]);
      }
      CHECK_HIP(hipFree(dArr));
      return 0;
    }


build command:

    clang++-16 --rocm-device-lib-path=/usr/lib/x86_64-linux-gnu/amdgcn/bitcode -x hip main.hip -lamdhip64

output:

    $ ./a.out
    KFD does not support xnack mode query.
    ROCr must assume xnack is disabled.
    "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
    Aborted (core dumped)

If I add -v to the build command, I can see that the compiler is
building for gfx906, which I presume is the default build architecture.

I'm not sure exactly why the architecture detection is failing. Could it
be a caused by the rename from amdgpu-arch to amdgpu-arch-16? Or perhaps
it's because in addition to printing my gpu architecture to stdout,
amdgpu-arch also prints this message to stderr?

    KFD does not support xnack mode query.
    ROCr must assume xnack is disabled.

I'm not exactly sure what the problem is, but I get my expected output
when I explicitly specify the architecture:

    clang++-16 --rocm-device-lib-path=/usr/lib/x86_64-linux-gnu/amdgcn/bitcode --offload-arch=gfx1010 -x hip main.hip -lamdhip64

expected output:

    # ./a.out 
    KFD does not support xnack mode query.
    ROCr must assume xnack is disabled.
    1.000000
    4.000000
    9.000000
    16.000000
    25.000000

The logic within clang to use amdgpu-arch is in the AMDGPU ToolChain
Driver. The answer to what is happening can probably be found within
the short function that deals with the amdgpu-arch [1].

P.S. I guess maybe the bitcode works cross-versions after all? I
wouldn't trust it without running a large test suite on a variety of
hardware platforms, but it's nice that it at least seems to work for
this particular example on this particular architecture. If it does
turn out to be compatible, we won't need to patch clang like was done
for LLVM 15. There is now a configuration file in LLVM 16 for specifying
default command-line options [2].

[1]: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.0/clang/lib/Driver/ToolChains/AMDGPU.cpp#L751-L774
[2]: https://clang.llvm.org/docs/UsersManual.html#configuration-files

-- System Information:
Debian Release: 12.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-5-amd64 (SMP w/32 CPU threads; PREEMPT)
Kernel taint flags: TAINT_WARN
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages clang-16 depends on:
ii  binutils                2.40-2
ii  libc6                   2.36-8
ii  libc6-dev               2.36-8
ii  libclang-common-16-dev  1:16.0.0-1~exp5
ii  libclang-cpp16          1:16.0.0-1~exp5
ii  libclang1-16            1:16.0.0-1~exp5
ii  libgcc-12-dev           12.2.0-14
ii  libgcc-s1               12.2.0-14
ii  libllvm16               1:16.0.0-1~exp5
ii  libobjc-12-dev          12.2.0-14
ii  libstdc++-12-dev        12.2.0-14
ii  libstdc++6              12.2.0-14
ii  llvm-16-linker-tools    1:16.0.0-1~exp5

Versions of packages clang-16 recommends:
ii  llvm-16-dev  1:16.0.0-1~exp5
ii  python3      3.11.2-1

Versions of packages clang-16 suggests:
pn  clang-16-doc  <none>
pn  wasi-libc     <none>

-- no debconf information

Reply to:

Prev by Date: Re: pytorch and CUDA
Next by Date: Re: Installing AMD HIP on Debian
Previous by thread: Re: Installing AMD HIP on Debian
Index(es):
- Date
- Thread