[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Building rocRAND with Debian HIP



Hi Cory,

Many thanks for your help all along, I think I managed to get
the rocrand package to smoke test properly rocm-hipamd, in
simili buildd conditions.

Cordell Bloor, on 2022-06-15:
> The hipErrorNoBinaryForGpu error is the problem causing your test failure.
> There are a number of 'code objects' (ELF files) embedded in librocrand.so.
> IIRC, when you launch a kernel, the HIP AMD runtime will load the
> corresponding code objects for that function for each AMD GPU installed in
> your system. If it can't find one (e.g., because the library was built for
> gfx906 but your hardware is gfx803), then you will encounter that error.

Okay, I get it, specifying the target seems to allow some
optimizations, but since we want to target all possible
architectures, I defer to what you mentioned below about
unsetting the AMDGPU_TARGET.

> It would be nice if HIP printed more information by default, but you can get
> it to emit more details about what it was looking for by setting the
> AMD_LOG_LEVEL environment variable [1]. Something like export
> AMD_LOG_LEVEL=4 should do the trick.
> 
> You can also use roc-obj and roc-obj-ls to inspect the code objects within a
> binary [2].

Thanks for the tips, the AMD_LOG_LEVEL=4 has been helpful to me.
I'll try to recall about the roc-obj and roc-obj-ls to unweave
object code if need were to be.

> The gfx803 architecture doesn't support the xnack feature, so it's an error
> to include it in the target id. You can see which architectures support
> xnack or sramecc by checking the AMDGPU Processors table in the LLVM
> documentation [3].

Okay, this explains the error I hit yesterday when I attempted
to adjust manually the architecture.

> To compile for gfx803 the CMake argument -DAMDGPU_TARGETS=gfx803 is
> sufficient. If left unset, roc{RAND,PRIM,BLAS,SOLVER,SPARSE,FFT} will
> compile for a default set of architectures. In the case of rocRAND, that
> would be gfx803, gfx900:xnack-, gfx906:xnack-, gfx908:xnack-, gfx90a:xnack-,
> gfx90a:xnack+ and gfx1030 [4]. Those are the architectures that are used for
> the binaries that AMD distributes directly.

Unsetting the variable did the trick.  I had one case of test
failure in #3 about the Poisson distribution, but I believe the
flakiness of that test is perhaps in the very nature generating
random values (this, or I have to question my RNG):

3: [ RUN      ] poisson_distribution_tests/poisson_distribution_tests.histogram_compare/1
3: ./test/test_poisson_distribution.cpp:128: Failure
3: The difference between h0 and h1 is 220, which exceeds std::max(samples_count
 * 1e-3, std::max(h0, h1) * 1e-1), where
3: h0 evaluates to 1755,
3: h1 evaluates to 1975, and
3: std::max(samples_count * 1e-3, std::max(h0, h1) * 1e-1) evaluates to 200.
3: [  FAILED  ] poisson_distribution_tests/poisson_distribution_tests.histogram_compare/1, where GetParam() = 5.5 (104 ms)

rocm-hipamd has been lingering in salsa for a while, will try to
put it into shape for upload, hopefully in not too much more
time.  Having rocrand test suite working will greatly help to
check the end result has no obvious issues.

Have a nice day,  :)
-- 
Étienne Mollier <emollier@emlwks999.eu>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/2, please excuse my verbosity.

Attachment: signature.asc
Description: PGP signature


Reply to: