[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Building rocRAND with Debian HIP



Hi Cory,

Cordell Bloor, on 2022-06-14:
> On 2022-06-14 14:41, Étienne Mollier wrote:
> > I'm at a point where I manage to build rocrand packages, but I
> > still struggle against googletest at the moment (main issue is
> > to make the build procedure recognize the locally installed one
> > in /usr/src/googletest, otherwise my build chokes on attempting
> > Internet access).  I thought I was missing GTEST_ROOT setting,
> > but this is not sufficient; I will continue tomorrow.
> 
> You don't need to set any flags or variables. You've just installed the
> wrong package. The one you want is libgtest-dev.

Ah, silly me, thanks for the tip, this triggered the build of
the test suite and I could move forward.  :)

> I would also recommend adding --no-parallel to the dh_auto_test arguments.
> It's not really that important for rocrand, but I would just do that by
> default for any library that uses the GPU. It might take a little bit
> longer, but it will result in more reliable tests.

Done on rocm-hipamd and rocrand to make sure several tests are
not colliding at once on a single gpu.  Overall this should make
things more stable, I agree.

After triggering the test suite of rocrand, I see most tests
failing with the following error messages show, e.g. test 23:

	23: Test command: /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/test/test_rocrand_xorwow_prng
	23: Test timeout computed to be: 10000000
	23: Running main() from ./googletest/src/gtest_main.cc
	23: [==========] Running 8 tests from 1 test suite.
	23: [----------] Global test environment set-up.
	23: [----------] 8 tests from rocrand_xorwow_prng_tests
	23: [ RUN      ] rocrand_xorwow_prng_tests.init_test
	23: LoadLib(libhsa-amd-aqlprofile64.so) failed: libhsa-amd-aqlprofile64.so: cannot open shared object file: No such file or directory
	23: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
	"
	23/29 Test #23: test_rocrand_xorwow_prng ............Subprocess aborted***Exception:   0.09 sec

In past discussion, I understood the libhsa-amd-aqlprofile64.so
should be benign (or should otherwise be skipped), so I believe
the issue mainly results from the hipErrorNoBinaryForGpu.  Full
list of failing tests at t time:

	  4 - test_rocrand_basic (Subprocess aborted)
	  5 - test_rocrand_cpp_wrapper (Subprocess aborted)
	  6 - test_rocrand_generate (Subprocess aborted)
	  7 - test_rocrand_generate_log_normal (Subprocess aborted)
	  8 - test_rocrand_generate_normal (Subprocess aborted)
	  9 - test_rocrand_generate_poisson (Subprocess aborted)
	 10 - test_rocrand_generate_uniform (Subprocess aborted)
	 11 - test_rocrand_generator_type (Subprocess aborted)
	 12 - test_rocrand_kernel_mrg32k3a (Subprocess aborted)
	 13 - test_rocrand_kernel_mtgp32 (Subprocess aborted)
	 14 - test_rocrand_kernel_philox4x32_10 (Subprocess aborted)
	 15 - test_rocrand_kernel_sobol32 (Subprocess aborted)
	 16 - test_rocrand_kernel_sobol64 (Subprocess aborted)
	 17 - test_rocrand_kernel_xorwow (Subprocess aborted)
	 18 - test_rocrand_mrg32k3a_prng (Subprocess aborted)
	 19 - test_rocrand_mtgp32_prng (Subprocess aborted)
	 20 - test_rocrand_philox_prng (Subprocess aborted)
	 21 - test_rocrand_sobol32_qrng (Subprocess aborted)
	 22 - test_rocrand_sobol64_qrng (Subprocess aborted)
	 23 - test_rocrand_xorwow_prng (Subprocess aborted)
	 25 - test_hiprand_api (Subprocess aborted)
	 26 - test_hiprand_cpp_wrapper (Subprocess aborted)
	 27 - test_hiprand_kernel (Subprocess aborted)

I tried various things in rocrand and rocm-hipamd to attempt to
enable the gfx803 architecture, as I was under the impression
that the existing packaging was mainly targetting gfx906, but
subsequent build attempts failed with:

	hipcc-cmd: /usr/bin/clang++-14  -std=c++11 -isystem "/usr/lib/llvm-14/lib/clang/14.0.5/include/.." -Xclang -fallow-half-arguments-and-returns -D__HIP_HCC_COMPAT_MODE__=1 -isystem /usr/hsa/include --offload-arch='gfx803:xnack-' -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false --hip-device-lib-path="/usr/lib/x86_64-linux-gnu/amdgcn/bitcode" -fhip-new-launch-api '--hip-version=5.0.0'  -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -specs=/usr/share/dpkg/no-pie-compile.specs -Wdate-time -o "CMakeFiles/cmTC_db3da.dir/testCXXCompiler.cxx.o" -c -x hip /<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
	clang: error: invalid target ID 'gfx803:xnack-'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')

which sounds rather odd since the format specified in the
--offload-arch argument looks to match the textual
specification.  But I believe that is a dead end.  I also tried
adjusting rocr-runtime, but rolled back my changes regarding
aqlprofile as they had no effects.  I suppose I must have missed
something somewhere, but that's my status for the moment, in
case you have an idea.

Have a nice day,  :)
-- 
Étienne Mollier <emollier@emlwks999.eu>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/3, please excuse my verbosity.
On air: Orion Dust - CXXVI

Attachment: signature.asc
Description: PGP signature


Reply to: