Hi Étienne,
I ran the tests on my Radeon VII workstation with Debian in docker on an Ubuntu 20.04 host and the AMDGPU kernel module. My list of failures is much shorter. In the next few weeks, I can set up a native install of Debian and test more thoroughly.
98% tests passed, 10 tests failed out of 408 Total Test time (real) = 1501.40 sec The following tests did not run: 96 - directed_tests/g++/hipMalloc_cxx_amd.tst (Skipped) The following tests FAILED: 99 - directed_tests/hiprtc/hiprtcGetLoweredName.tst (SEGFAULT) 100 - directed_tests/hiprtc/saxpy.tst (SEGFAULT) 101 - directed_tests/ipc/hipMultiProcIpcEvent.tst (Timeout) 102 - directed_tests/ipc/hipMultiProcIpcMem.tst (Timeout) 196 - directed_tests/runtimeApi/memory/hipHostRegister.tst (Subprocess aborted) 213 - directed_tests/runtimeApi/memory/hipMemcpy-dev-offsets.tst (Subprocess aborted) 214 - directed_tests/runtimeApi/memory/hipMemcpy-host-offsets.tst (Subprocess aborted) 290 - directed_tests/runtimeApi/module/hipExtLaunchKernelGGL_KernelExeTime.tst (Subprocess aborted) 294 - directed_tests/runtimeApi/module/hipExtModuleLaunchKernel_KernelExecutionTime.tst (Subprocess aborted) 391 - directed_tests/runtimeApi/stream/hipStreamCreateWithPriority.tst (Failed)
The two segmentation faults in tests #99 and #100 might be of concern, I caught the following around these, so maybe a library to get packaged in Debian too: 99: LoadLib(libhsa-amd-aqlprofile64.so) failed: libhsa-amd-aqlprofile64.so: cannot open shared object file: No such file or directory
That library adds HSA extensions for performance profiling. Unfortunately, it is proprietary. The last thing I heard about it was from John Bridgman [1].
1. The aqlprofile package is an optional one, used to generate packets with PM4 code for use with the profiler - ROCR checks for it at start but runs fine without it (might be worth checking on your current system) 2. We do not yet have approval to release source code - waiting to hear back on status
To my knowledge, the libhsa-amd-aqlprofile64 source has never
been approved for public release. We can ask Bridgman about it
again, but I expect that we will just need to patch out the
attempts to load the library. We might want to take a look through
the Gentoo packages to see how they handled this. There's a bug
report describing a similar problem [2] and a patch for the
rocr-runtime [3]. I imagine Spack also has to deal with this in
their HIP package [4], so I'll perhaps take a look at what they
do.
[1]:
https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1003883-gpuvm-discrete-gpu-code-for-amdkfd-radeon-compute-could-be-ready-for-linux-4-17?p=1004038#post1004038
[2]: https://bugs.gentoo.org/716948
[3]:
https://gitweb.gentoo.org/repo/gentoo.git/tree/dev-libs/rocr-runtime/files/rocr-runtime-4.3.0_no-sqlprofiler.patch
[4]:
https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/hip/package.py
Sincerely,
Cory Bloor