Hi Cory, Many thanks for your help all along, I think I managed to get the rocrand package to smoke test properly rocm-hipamd, in simili buildd conditions. Cordell Bloor, on 2022-06-15: > The hipErrorNoBinaryForGpu error is the problem causing your test failure. > There are a number of 'code objects' (ELF files) embedded in librocrand.so. > IIRC, when you launch a kernel, the HIP AMD runtime will load the > corresponding code objects for that function for each AMD GPU installed in > your system. If it can't find one (e.g., because the library was built for > gfx906 but your hardware is gfx803), then you will encounter that error. Okay, I get it, specifying the target seems to allow some optimizations, but since we want to target all possible architectures, I defer to what you mentioned below about unsetting the AMDGPU_TARGET. > It would be nice if HIP printed more information by default, but you can get > it to emit more details about what it was looking for by setting the > AMD_LOG_LEVEL environment variable [1]. Something like export > AMD_LOG_LEVEL=4 should do the trick. > > You can also use roc-obj and roc-obj-ls to inspect the code objects within a > binary [2]. Thanks for the tips, the AMD_LOG_LEVEL=4 has been helpful to me. I'll try to recall about the roc-obj and roc-obj-ls to unweave object code if need were to be. > The gfx803 architecture doesn't support the xnack feature, so it's an error > to include it in the target id. You can see which architectures support > xnack or sramecc by checking the AMDGPU Processors table in the LLVM > documentation [3]. Okay, this explains the error I hit yesterday when I attempted to adjust manually the architecture. > To compile for gfx803 the CMake argument -DAMDGPU_TARGETS=gfx803 is > sufficient. If left unset, roc{RAND,PRIM,BLAS,SOLVER,SPARSE,FFT} will > compile for a default set of architectures. In the case of rocRAND, that > would be gfx803, gfx900:xnack-, gfx906:xnack-, gfx908:xnack-, gfx90a:xnack-, > gfx90a:xnack+ and gfx1030 [4]. Those are the architectures that are used for > the binaries that AMD distributes directly. Unsetting the variable did the trick. I had one case of test failure in #3 about the Poisson distribution, but I believe the flakiness of that test is perhaps in the very nature generating random values (this, or I have to question my RNG): 3: [ RUN ] poisson_distribution_tests/poisson_distribution_tests.histogram_compare/1 3: ./test/test_poisson_distribution.cpp:128: Failure 3: The difference between h0 and h1 is 220, which exceeds std::max(samples_count * 1e-3, std::max(h0, h1) * 1e-1), where 3: h0 evaluates to 1755, 3: h1 evaluates to 1975, and 3: std::max(samples_count * 1e-3, std::max(h0, h1) * 1e-1) evaluates to 200. 3: [ FAILED ] poisson_distribution_tests/poisson_distribution_tests.histogram_compare/1, where GetParam() = 5.5 (104 ms) rocm-hipamd has been lingering in salsa for a while, will try to put it into shape for upload, hopefully in not too much more time. Having rocrand test suite working will greatly help to check the end result has no obvious issues. Have a nice day, :) -- Étienne Mollier <emollier@emlwks999.eu> Fingerprint: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da Sent from /dev/pts/2, please excuse my verbosity.
Attachment:
signature.asc
Description: PGP signature