Bug#1065410: libhsa-runtime64-1: assertion in gfx10addrlib.cpp on gfx1035
On Mon, 04 Mar 2024 04:35:50 +0000 Cordell Bloor <cgmb@slerp.xyz> wrote:
> Many tests began failing for the gfx1035 ISA on the Debian ROCm CI upon
> the update to libhsa-runtime64-1 (5.7.1-1). The failure is an assertion:
>
> ./src/image/addrlib/src/gfx10/gfx10addrlib.cpp:1083: virtual
rocr::Addr::ChipFamily
rocr::Addr::V2::Gfx10Lib::HwlConvertChipFamily(unsigned int, unsigned
int): Assertion `false' failed.
>
> The rocblas test logs suggest that this was introduced with the update
> to rocr-runtime 5.7.1-1 [1], as the tests passed before [2]. On Debian
> Testing, it even passed with libhsakm1 (5.7.0-1) [3].
>
> The assertion is complaining that it's not a Rembrandt ASIC [4].
> However, the test system is a Minisforum UM773 Lite with an AMD Ryzen
> 7735 HS (/w AMD Radeon 680M integrated graphics).
This seems to be due to the check on the chipRevision that being added
some time between 5.2.3 and 5.7.1. For the APUs, the check is written as
ensuring that the revision is in the range 0x1 to 0xFF [5]. However, the
chipRevision of my Rembrandt APU is 0x00 within this function.
rocminfo reports
Chip ID: 5761(0x1681)
ASIC Revision: 2(0x2)
so I imagine that the chip revision should probably be 2 and the value
of 0 is really just because it was never initialized.
I've reproduced the problem using AMD's prebuilt binaries from ROCm
6.0.2, so this is an issue in the upstream project as well.
Sincerely,
Cory Bloor
> [1]:
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/7826/log.gz
> [2]:
https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/4334/log.gz
> [3]:
https://ci.rocm.debian.net/data/autopkgtest/testing/amd64+gfx1035/r/rocblas/8115/log.gz
> [4]:
https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/gfx10/gfx10addrlib.cpp?ref_type=tags#L1083
[5]:
https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/amdgpu_asic_addr.h#L123
Reply to: