[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Question about how to handle HIP vs hipamd



Hi Étienne,

I ran the tests on my Radeon VII workstation with Debian in docker on an Ubuntu 20.04 host and the AMDGPU kernel module. My list of failures is much shorter. In the next few weeks, I can set up a native install of Debian and test more thoroughly.

98% tests passed, 10 tests failed out of 408

Total Test time (real) = 1501.40 sec

The following tests did not run: 
     96 - directed_tests/g++/hipMalloc_cxx_amd.tst (Skipped)

The following tests FAILED:
     99 - directed_tests/hiprtc/hiprtcGetLoweredName.tst (SEGFAULT)
    100 - directed_tests/hiprtc/saxpy.tst (SEGFAULT)
    101 - directed_tests/ipc/hipMultiProcIpcEvent.tst (Timeout)
    102 - directed_tests/ipc/hipMultiProcIpcMem.tst (Timeout)
    196 - directed_tests/runtimeApi/memory/hipHostRegister.tst (Subprocess aborted)
    213 - directed_tests/runtimeApi/memory/hipMemcpy-dev-offsets.tst (Subprocess aborted)
    214 - directed_tests/runtimeApi/memory/hipMemcpy-host-offsets.tst (Subprocess aborted)
    290 - directed_tests/runtimeApi/module/hipExtLaunchKernelGGL_KernelExeTime.tst (Subprocess aborted)
    294 - directed_tests/runtimeApi/module/hipExtModuleLaunchKernel_KernelExecutionTime.tst (Subprocess aborted)
    391 - directed_tests/runtimeApi/stream/hipStreamCreateWithPriority.tst (Failed)
With that being said, I'm most interested in whether rocm-hipamd can build the math libraries and run their test suites. Passing those would be proof to me that all the basic HIP features are working sufficiently to do useful work.

On 2022-05-17 14:33, Étienne Mollier wrote:
The two segmentation faults in tests #99
and #100 might be of concern, I caught the following around
these, so maybe a library to get packaged in Debian too:

	99: LoadLib(libhsa-amd-aqlprofile64.so) failed: libhsa-amd-aqlprofile64.so: cannot open shared object file: No such file or directory

That library adds HSA extensions for performance profiling. Unfortunately, it is proprietary. The last thing I heard about it was from John Bridgman [1].

On 2018-01-28 06:04, John Bridgman wrote:
1. The aqlprofile package is an optional one, used to generate packets with PM4 code for use with the profiler - ROCR checks for it at start but runs fine without it (might be worth checking on your current system)

2. We do not yet have approval to release source code - waiting to hear back on status

To my knowledge, the libhsa-amd-aqlprofile64 source has never been approved for public release. We can ask Bridgman about it again, but I expect that we will just need to patch out the attempts to load the library. We might want to take a look through the Gentoo packages to see how they handled this. There's a bug report describing a similar problem [2] and a patch for the rocr-runtime [3]. I imagine Spack also has to deal with this in their HIP package [4], so I'll perhaps take a look at what they do.

[1]: https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1003883-gpuvm-discrete-gpu-code-for-amdkfd-radeon-compute-could-be-ready-for-linux-4-17?p=1004038#post1004038
[2]: https://bugs.gentoo.org/716948
[3]: https://gitweb.gentoo.org/repo/gentoo.git/tree/dev-libs/rocr-runtime/files/rocr-runtime-4.3.0_no-sqlprofiler.patch
[4]: https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/hip/package.py

Sincerely,
Cory Bloor


Reply to: