[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1065410: libhsa-runtime64-1: assertion in gfx10addrlib.cpp on gfx1035



On Mon, 04 Mar 2024 04:35:50 +0000 Cordell Bloor <cgmb@slerp.xyz> wrote:
> Many tests began failing for the gfx1035 ISA on the Debian ROCm CI upon
> the update to libhsa-runtime64-1 (5.7.1-1). The failure is an assertion:
>
> ./src/image/addrlib/src/gfx10/gfx10addrlib.cpp:1083: virtual rocr::Addr::ChipFamily rocr::Addr::V2::Gfx10Lib::HwlConvertChipFamily(unsigned int, unsigned int): Assertion `false' failed.
>
> The rocblas test logs suggest that this was introduced with the update
> to rocr-runtime 5.7.1-1 [1], as the tests passed before [2]. On Debian
> Testing, it even passed with libhsakm1 (5.7.0-1) [3].
>
> The assertion is complaining that it's not a Rembrandt ASIC [4].
> However, the test system is a Minisforum UM773 Lite with an AMD Ryzen
> 7735 HS (/w AMD Radeon 680M integrated graphics).

This seems to be due to the check on the chipRevision that being added some time between 5.2.3 and 5.7.1. For the APUs, the check is written as ensuring that the revision is in the range 0x1 to 0xFF [5]. However, the chipRevision of my Rembrandt APU is 0x00 within this function.

rocminfo reports

  Chip ID:                 5761(0x1681)
  ASIC Revision:           2(0x2)

so I imagine that the chip revision should probably be 2 and the value of 0 is really just because it was never initialized.

I've reproduced the problem using AMD's prebuilt binaries from ROCm 6.0.2, so this is an issue in the upstream project as well.

Sincerely,
Cory Bloor

> [1]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/7826/log.gz > [2]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/4334/log.gz > [3]: https://ci.rocm.debian.net/data/autopkgtest/testing/amd64+gfx1035/r/rocblas/8115/log.gz > [4]: https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/gfx10/gfx10addrlib.cpp?ref_type=tags#L1083

[5]: https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/amdgpu_asic_addr.h#L123


Reply to: