[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1056172: marked as done (librocprim-tests: Test failures when gfx1030 code is run on gfx1031 hardware)



Your message dated Tue, 26 Mar 2024 13:39:34 -0600
with message-id <4ee4f2f1-c932-e49a-f776-05cbf5d0efbc@slerp.xyz>
and subject line Re: librocprim-tests: Test failures when gfx1030 code is run on gfx1031 hardware
has caused the Debian Bug report #1056172,
regarding librocprim-tests: Test failures when gfx1030 code is run on gfx1031 hardware
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1056172: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1056172
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: librocprim-tests
Version: 5.5.1-2
Severity: important
X-Debbugs-Cc: cgmb@slerp.xyz

Dear Maintainer,

In ROCm 5.5, the rocprim library added functionality that branched on
the detected GPU hardware to select the appropriate implementation of
the library functionality. This does not account for the possibility
of executing code compiled for gfx1030 on gfx103{1,2,3,4,5,6} hardware, so the
lookup fails and incorrect results are produced.

This problem be observed seen in the CI results for rocprim on gfx1031 [1],
gfx1032 [2], and gfx1034 [3]. This is also probably the cause for
rocsparse failures in prune_dense2csr_by_percentage on those architectures [4][5][6].

Sincerely,
Cory Bloor

[1]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1031/r/rocprim/1154/log.gz
[2]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1032/r/rocprim/832/log.gz
[3]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1034/r/rocprim/210/log.gz
[4]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1031/r/rocsparse/1156/log.gz
[5]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1032/r/rocsparse/834/log.gz
[6]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1034/r/rocsparse/213/log.gz

-- System Information:
Debian Release: trixie/sid
  APT prefers testing
  APT policy: (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 6.5.0-2-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages librocprim-tests depends on:
ii  libamdhip64-5  5.2.3-13
ii  libc6          2.37-12
ii  libgcc-s1      13.2.0-5
ii  libstdc++6     13.2.0-5

librocprim-tests recommends no packages.

librocprim-tests suggests no packages.

-- no debconf information

--- End Message ---
--- Begin Message ---
Version: 5.7.1-2

This has been fixed in the most recent version of rocprim on unstable. The default behaviour for code built with librocprim-dev has been unchanged from upstream, but the Debian-specific macro ROCPRIM_USE_ARCH_CONVERSION will treat gfx1031 and other RDNA 2 architectures as if they were gfx1030. It is enabled when building the rocprim tests, which now pass on gfx1031 [1].

There were perhaps other ways to do this, but the opt-in was a very safe method and _anything_ we do in Debian is going to be temporary. The proper solution from upstream will be to use the gfx10.3-generic ISA once that is supported in the HIP runtime.

Sincerely,
Cory Bloor

[1]: https://ci.rocm.debian.net/packages/r/rocprim/unstable/amd64+gfx1031/10415/
--- End Message ---

Reply to: