[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1065701: marked as done (rocm_agent_enumerator: crash on systems without AMD GPU)



Your message dated Fri, 15 Mar 2024 23:37:46 +0100
with message-id <sa6msqzi0jp.fsf@hjemme.reinholdtsen.name>
and subject line Re: Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
has caused the Debian Bug report #1065701,
regarding rocm_agent_enumerator: crash on systems without AMD GPU
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1065701: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1065701
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: rocminfo
Version: 5.7.1-1
Severity: normal
X-Debbugs-Cc: cgmb@slerp.xyz

Dear Maintainer,

On systems, the rocm_agent_enumerator command may crash with an error:

    Traceback (most recent call last):
      File "/usr/bin/rocm_agent_enumerator", line 260, in <module>
        main()
      File "/usr/bin/rocm_agent_enumerator", line 244, in main
        target_list = readFromKFD()
                      ^^^^^^^^^^^^^
      File "/usr/bin/rocm_agent_enumerator", line 193, in readFromKFD
        for node in sorted(os.listdir(topology_dir)):
                           ^^^^^^^^^^^^^^^^^^^^^^^^
    FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/kfd/kfd/topology/nodes/'

It's not clear to me exactly why this error is emitted. Perhaps it's
because the system does not have an AMD GPU at all. In that case, the
expected output would be "gfx000\n". The purpose of
rocm_agent_enumerator is to list all AMD GPUs on a system. If there are
no AMD GPUs, then it should be an empty list.

This behaviour can be seen in the rocm-hipamd autopkgtests [1]. While
hipcc should probably not be calling rocm_agent_enumerator when the
offload architecture has been manually specified, the
rocm_agent_enumerator shouldn't be emiting any output on stderr.

Sincerely,
Cory Bloor

[1]: https://ci.debian.net/data/autopkgtest/testing/amd64/r/rocm-hipamd/43752739/log.gz

-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages rocminfo depends on:
ii  kmod                31+20240202-2
ii  libc6               2.37-15.1
ii  libgcc-s1           14-20240303-1
ii  libhsa-runtime64-1  5.7.1-1
ii  libstdc++6          14-20240303-1
ii  pciutils            1:3.11.1-1
ii  python3             3.11.8-1

rocminfo recommends no packages.

rocminfo suggests no packages.

-- no debconf information

--- End Message ---
--- Begin Message ---
Issue is fixed in unstable.

-- 
Happy hacking
Petter Reinholdtsen

--- End Message ---

Reply to: