[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1065410: marked as done (libhsa-runtime64-1: assertion in gfx10addrlib.cpp on gfx1035)



Your message dated Mon, 18 Mar 2024 19:00:04 -0600
with message-id <8c73057e-5b09-4f2d-b298-a6a7c49a68ed@slerp.xyz>
and subject line Re: assertion in gfx10addrlib.cpp on gfx1035
has caused the Debian Bug report #1065410,
regarding libhsa-runtime64-1: assertion in gfx10addrlib.cpp on gfx1035
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1065410: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1065410
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: libhsa-runtime64-1
Version: 5.7.1-1
Severity: normal
X-Debbugs-Cc: cgmb@slerp.xyz

Dear Maintainer,

Many tests began failing for the gfx1035 ISA on the Debian ROCm CI upon
the update to libhsa-runtime64-1 (5.7.1-1). The failure is an assertion:

./src/image/addrlib/src/gfx10/gfx10addrlib.cpp:1083: virtual rocr::Addr::ChipFamily rocr::Addr::V2::Gfx10Lib::HwlConvertChipFamily(unsigned int, unsigned int): Assertion `false' failed.

The rocblas test logs suggest that this was introduced with the update
to rocr-runtime 5.7.1-1 [1], as the tests passed before [2]. On Debian
Testing, it even passed with libhsakm1 (5.7.0-1) [3].

The assertion is complaining that it's not a Rembrandt ASIC [4].
However, the test system is a Minisforum UM773 Lite with an AMD Ryzen
7735 HS (/w AMD Radeon 680M integrated graphics). That's Rembrandt.

Sincerely,
Cory Bloor

[1]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/7826/log.gz
[2]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx1035/r/rocblas/4334/log.gz
[3]: https://ci.rocm.debian.net/data/autopkgtest/testing/amd64+gfx1035/r/rocblas/8115/log.gz
[4]: https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.7.1-1/src/image/addrlib/src/gfx10/gfx10addrlib.cpp?ref_type=tags#L1083

-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages libhsa-runtime64-1 depends on:
ii  libc6           2.38-6
ii  libdrm-amdgpu1  2.4.120-2
ii  libdrm2         2.4.120-2
ii  libelf1         0.190-1+b1
ii  libgcc-s1       14-20240221-2.1
ii  libhsakmt1      5.7.0-1
ii  libstdc++6      14-20240221-2.1

libhsa-runtime64-1 recommends no packages.

libhsa-runtime64-1 suggests no packages.

-- no debconf information

--- End Message ---
--- Begin Message ---
Version: 5.7.1-2

A query for the ASIC revision has been added to ensure that it is initialized. This appears to fix the problem on Rembrandt (gfx1035) and has not had any obvious negative effects on other hardware. As such, I've uploaded the fix to unstable in 5.7.1-2.

Sincerely,
Cory Bloor

--- End Message ---

Reply to: