Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
Package: rocminfo
Version: 5.7.1-1
Severity: normal
X-Debbugs-Cc: cgmb@slerp.xyz
Dear Maintainer,
On systems, the rocm_agent_enumerator command may crash with an error:
Traceback (most recent call last):
File "/usr/bin/rocm_agent_enumerator", line 260, in <module>
main()
File "/usr/bin/rocm_agent_enumerator", line 244, in main
target_list = readFromKFD()
^^^^^^^^^^^^^
File "/usr/bin/rocm_agent_enumerator", line 193, in readFromKFD
for node in sorted(os.listdir(topology_dir)):
^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/kfd/kfd/topology/nodes/'
It's not clear to me exactly why this error is emitted. Perhaps it's
because the system does not have an AMD GPU at all. In that case, the
expected output would be "gfx000\n". The purpose of
rocm_agent_enumerator is to list all AMD GPUs on a system. If there are
no AMD GPUs, then it should be an empty list.
This behaviour can be seen in the rocm-hipamd autopkgtests [1]. While
hipcc should probably not be calling rocm_agent_enumerator when the
offload architecture has been manually specified, the
rocm_agent_enumerator shouldn't be emiting any output on stderr.
Sincerely,
Cory Bloor
[1]: https://ci.debian.net/data/autopkgtest/testing/amd64/r/rocm-hipamd/43752739/log.gz
-- System Information:
Debian Release: trixie/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect
Versions of packages rocminfo depends on:
ii kmod 31+20240202-2
ii libc6 2.37-15.1
ii libgcc-s1 14-20240303-1
ii libhsa-runtime64-1 5.7.1-1
ii libstdc++6 14-20240303-1
ii pciutils 1:3.11.1-1
ii python3 3.11.8-1
rocminfo recommends no packages.
rocminfo suggests no packages.
-- no debconf information
Reply to: