[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Blender crash with packaged ROCm 5.2.3 drivers



Hi Étienne,

I found that the crash is caused by `libamd-comgr2` package. I assumed it was necessary since I remember it beeing installed with amdgpu-pro. After uninstalling it there is no crash, but Blender still reports no HIP devices:

I1106 23:33:10.011876 123411 device.cpp:32] HIPEW initialization succeeded
I1106 23:33:10.011904 123411 device.cpp:34] Found precompiled kernels
HIP hipGetDeviceCount: No HIP-capable device available

Another findings are that my hipconfig under `== hip-clang` section does not have any devices listed, and that hipconfig.pl is missing some path:

InstalledDir: /usr/bin
Can't exec "/usr/bin/llc": No such file or directory at /usr/bin//hipconfig.pl line 180. hip-clang-cxxflags :  -std=c++11 -isystem "/usr/lib/llvm-15/lib/clang/15.0.4/include/.." -isystem /usr/hsa/include --hip-version=5.2.21153 --rocm-path=/usr -O3 hip-clang-ldflags  :  -L"/usr/lib" --hip-version=5.2.21153 --rocm-path=/usr -O3 -lgcc_s -lgcc -lpthread -lm -lrt

Example snippet from hipconfig I found on github:

InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
  LLVM version 7.0.0svn
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags :  -hc -std=c++amp -I/opt/rocm/hcc/includeHCC-ldflags  :  -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

I don't know how much of this is caused by me not knowing what exactly to install and configure and how much by all the stack beeing so new and untested.
Best regards,
JJ

On 05-11-2022 00:57, Jakub Jaszewski wrote:
Hi Étienne,

Thanks for taking a look :)
I'll forward your findings to Cycles developers.

Best regards,
JJ

On 05-11-2022 00:13, Étienne Mollier wrote:

Hi Jakub,

Jakub Jaszewski, on 2022-11-04:
Please forgive me if this is not the right place to discuss such issues. My name is Jakub and for my work I use FOSS 3D software - Blender [1] for which recently AMD contributed HIP compute backend [2] as part of the official
support.
I think you're at a right place to discuss ROCm related topics
in Debian context.  :)

After ROCm 5.2.3 landed in Debian unstable I gave it a try with Blender, and after some initial hurdles with binaries path I encounterred an LLVM error which resulted in a crash. Blender developer said that this is not something they can fix. The entire issue is documented on Blender bugtracker [3] where
you can find all the details.

The most relevant part of debug log:

I1102 11:33:38.655006 91168 device.cpp:32] HIPEW initialization succeeded
I1102 11:33:38.655035 91168 device.cpp:34] Found precompiled kernels
mesa: CommandLine Error: Option 'h' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Aborted

I don't know if this can qualify as a bug that should be reported here on debian bugtracker or somewhere else. Any help would be greatly appreciated.

[1] https://builder.blender.org/download/daily/
[2] https://developer.blender.org/D12578
[3] https://developer.blender.org/T102018
I have been scratching my head on what would be the necessary
changes to properly hide symbols so to prevent them from
colliding with Mesa; as far as I could witness, except for the
rocm-smi-lib, library symbols are already filtered.  But it is
quite possible I missed the point and didn't look at the right
things (I've been after .map, .def and packages symbols lists).

Other than that, I tried to build a custom blender 3.3.1 version
with HIP Cycles support following the packaging changes
suggested by Cordell Bloor in #1021646[4], to see if I could
reproduce the issue.

[4]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1021646

Apparently I managed to reproduce a crash about at the same step
as you observe.  I got a slightly different output when running
blender through the debugger: the Mesa CommandLine Error does
not appear on my end.  Here below is the tracing information
from the debugger:

I1105 00:57:15.862457 879154 device.cpp:32] HIPEW initialization succeeded
I1105 00:57:15.862509 879154 device.cpp:34] Found precompiled kernels
[New Thread 0x7fff325ff6c0 (LWP 879325)]

Thread 1 "blender" received signal SIGSEGV, Segmentation fault.
0x00007ffff7c15d95 in ?? () from /lib/x86_64-linux-gnu/libjemalloc.so.2
(gdb) bt
#0  0x00007ffff7c15d95 in  () at /lib/x86_64-linux-gnu/libjemalloc.so.2
#1  0x00007fff32959154 in  () at /lib/x86_64-linux-gnu/libamdhip64.so
#2  0x00007fff32960fa8 in  () at /lib/x86_64-linux-gnu/libamdhip64.so
#3  0x00007fff3290f19e in  () at /lib/x86_64-linux-gnu/libamdhip64.so
#4  0x00007fff32952dfe in  () at /lib/x86_64-linux-gnu/libamdhip64.so
#5  0x00007fff326c676c in  () at /lib/x86_64-linux-gnu/libamdhip64.so
#6  0x00007fff326c75ad in hipInit () at /lib/x86_64-linux-gnu/libamdhip64.so #7  0x0000555557e38824 in ccl::device_hip_safe_init () at ./intern/cycles/device/hip/device.cpp:96 #8  ccl::device_hip_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) (devices=...) at ./intern/cycles/device/hip/device.cpp:104 #9  0x0000555557e20b7a in ccl::Device::available_devices(unsigned int) (mask=34) at ./intern/cycles/device/device.cpp:228 #10 0x0000555557bbbc3d in ccl::available_devices_func(PyObject*, PyObject*) (args=<optimized out>) at ./intern/cycles/blender/python.cpp:416 #11 0x00007fffeff28413 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #12 0x00007fffefedebce in _PyObject_MakeTpCall () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #13 0x00007fffefe79cb4 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #14 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #15 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #16 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #17 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #18 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #19 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #20 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #21 0x00007fffefee31b8 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #22 0x00007fffefe79c63 in _PyEval_EvalFrameDefault () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #23 0x00007fffeffc70c6 in  () at /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #24 0x0000555556ac015f in bpy_class_call (C=0x7fffd967e2b8, ptr=<optimized out>, func=0x55555ac15da0 <rna_Panel_draw_func>, parms=0x7fffffffdca0) at ./source/blender/python/intern/bpy_rna.c:8690 #25 0x0000555556a5da5c in panel_draw (C=<optimized out>, panel=0x7fff439304b8) at ./source/blender/makesrna/intern/rna_ui.c:129 #26 0x0000555556adafab in ed_panel_draw (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038, lb=lb@entry=0x7fff4bc55130, pt=pt@entry=0x7fff4b8ca938, panel=0x7fff439304b8, panel@entry=0x0, w=484, em=20, unique_panel_str=0x0, search_filter=0x0) at ./source/blender/editors/screen/area.c:2791 #27 0x0000555556adca43 in ED_region_panels_layout_ex (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038, paneltypes=<optimized out>, contexts=contexts@entry=0x7fffffffdf60, category_override=category_override@entry=0x0) at ./source/blender/editors/screen/area.c:2989 #28 0x00005555584a3be5 in userpref_main_region_layout (C=0x7fffd967e2b8, region=0x7fff4bc55038) at ./source/blender/editors/space_userpref/space_userpref.c:128 #29 0x0000555556adbb9e in ED_region_do_layout (C=C@entry=0x7fffd967e2b8, region=region@entry=0x7fff4bc55038) at ./source/blender/editors/screen/area.c:511 #30 0x00005555565543f5 in wm_draw_window_offscreen (stereo=false, win=0x7fff43dd7a78, C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:889 #31 wm_draw_window (win=0x7fff43dd7a78, C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:1111 #32 wm_draw_update (C=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm_draw.c:1338 #33 0x0000555556550f40 in WM_main (C=C@entry=0x7fffd967e2b8) at ./source/blender/windowmanager/intern/wm.c:640 #34 0x0000555555efa1ca in main (argc=2, argv=0x7fffffffe248) at ./source/creator/creator.c:547

In hope this helps pinpointing what's wrong,



Reply to: