[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU



Package: rocminfo
Version: 5.7.1-1
Severity: normal
X-Debbugs-Cc: cgmb@slerp.xyz

Dear Maintainer,

On systems, the rocm_agent_enumerator command may crash with an error:

    Traceback (most recent call last):
      File "/usr/bin/rocm_agent_enumerator", line 260, in <module>
        main()
      File "/usr/bin/rocm_agent_enumerator", line 244, in main
        target_list = readFromKFD()
                      ^^^^^^^^^^^^^
      File "/usr/bin/rocm_agent_enumerator", line 193, in readFromKFD
        for node in sorted(os.listdir(topology_dir)):
                           ^^^^^^^^^^^^^^^^^^^^^^^^
    FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/kfd/kfd/topology/nodes/'

It's not clear to me exactly why this error is emitted. Perhaps it's
because the system does not have an AMD GPU at all. In that case, the
expected output would be "gfx000\n". The purpose of
rocm_agent_enumerator is to list all AMD GPUs on a system. If there are
no AMD GPUs, then it should be an empty list.

This behaviour can be seen in the rocm-hipamd autopkgtests [1]. While
hipcc should probably not be calling rocm_agent_enumerator when the
offload architecture has been manually specified, the
rocm_agent_enumerator shouldn't be emiting any output on stderr.

Sincerely,
Cory Bloor

[1]: https://ci.debian.net/data/autopkgtest/testing/amd64/r/rocm-hipamd/43752739/log.gz

-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages rocminfo depends on:
ii  kmod                31+20240202-2
ii  libc6               2.37-15.1
ii  libgcc-s1           14-20240303-1
ii  libhsa-runtime64-1  5.7.1-1
ii  libstdc++6          14-20240303-1
ii  pciutils            1:3.11.1-1
ii  python3             3.11.8-1

rocminfo recommends no packages.

rocminfo suggests no packages.

-- no debconf information


Reply to: