[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1053864: libdrm-amdgpu1: gpu crash on graphics start with Radeon 760M (both sway and gdm3)



Package: libdrm-amdgpu1
Version: 2.4.115-1
Severity: normal
X-Debbugs-Cc: icefox@dreamquest.io

Dear Maintainer,

When GDM3 starts, or when I turn it off and log into the console by hand
and then start sway or another WM, often the graphics mode switch will
hang for a few seconds on an unresponsive black screen, then go back to
a text console for an instant and try again.  This seems to repeat 0-3
times until eventually it works successfully.  Sometimes it works on the
first try, often on the second try, etc.

Once Sway or GDM3 and Xorg have actually started, it *seems* perfectly
stable, as far as I've seen so far.

This is a brand new GPU chipset afaik so graphics bugs are pretty
understandable.

CPU: AMD Ryzen 5 7640U w/ Radeon 760M Graphics
Extended renderer info from `glxinfo`:
    Device: AMD Radeon Graphics (gfx1103_r1, LLVM 16.0.6, DRM 3.54, 6.5.0-1-amd64) (0x15bf)
    Version: 23.2.1

I also see the following errors in dmesg associated with the
apparent-crash-and-restart:

[   26.625039] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=23, emitted seq=25
[   26.625482] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[   26.625820] amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
[   26.810595] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   26.810761] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   26.944169] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   26.944310] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   27.077693] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   27.077834] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   27.211163] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   27.211303] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   27.344634] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   27.344776] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   27.478028] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   27.478175] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   27.611499] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   27.611640] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   27.744960] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   27.745097] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   27.878425] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[   27.878564] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[   27.880086] amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
[   27.909811] amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
[   27.910426] [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
[   27.910540] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[   27.911480] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[   27.913327] [drm] DMUB hardware initialized: version=0x08000E00
[   27.918776] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:264
[   27.921376] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:272
[   27.923969] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:280
[   27.926566] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:288
[   27.934650] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:264
[   27.937248] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:272
[   27.939841] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:280
[   27.942439] [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:288
[   28.328853] [drm] kiq ring mec 3 pipe 1 q 0
[   28.331133] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[   28.331252] amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[   28.331965] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[   28.331968] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[   28.331971] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[   28.331973] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[   28.331975] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[   28.331977] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[   28.331979] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[   28.331981] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[   28.331983] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[   28.331985] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[   28.331987] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[   28.331990] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[   28.331992] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[   28.334786] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
[   28.334791] amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
[   28.334933] [drm] Skip scheduling IBs!
[   28.334955] [drm] Skip scheduling IBs!
[   28.334964] [drm] Skip scheduling IBs!
[   28.334971] [drm] Skip scheduling IBs!
[   28.334979] [drm] Skip scheduling IBs!
[   28.334987] [drm] Skip scheduling IBs!
[   28.334995] [drm] Skip scheduling IBs!
[   28.335006] [drm] Skip scheduling IBs!
[   28.335014] [drm] Skip scheduling IBs!
[   28.335070] [drm] Skip scheduling IBs!
[   28.335079] [drm] Skip scheduling IBs!
[   28.335085] [drm] Skip scheduling IBs!
[   28.336265] [drm] ring gfx_32776.1.1 was added
[   28.337256] [drm] ring compute_32776.2.2 was added
[   28.338182] [drm] ring sdma_32776.3.3 was added
[   28.338234] [drm] ring gfx_32776.1.1 ib test pass
[   28.338272] [drm] ring compute_32776.2.2 ib test pass
[   28.338470] [drm] ring sdma_32776.3.3 ib test pass
[   28.339726] amdgpu 0000:c1:00.0: amdgpu: GPU reset(1) succeeded!
[   28.518882] [drm] Skip scheduling IBs!
[   28.518892] [drm] Skip scheduling IBs!
[   28.518897] [drm] Skip scheduling IBs!
[   28.520085] [drm] Skip scheduling IBs!
[   28.521361] [drm] Skip scheduling IBs!
[   28.541083] [drm] Skip scheduling IBs!
[   28.541114] [drm] Skip scheduling IBs!
[   28.541143] [drm] Skip scheduling IBs!
[   28.541159] [drm] Skip scheduling IBs!
[   28.541173] [drm] Skip scheduling IBs!
[   28.541193] [drm] Skip scheduling IBs!
[   28.541215] [drm] Skip scheduling IBs!
[   28.541219] [drm] Skip scheduling IBs!
[   28.541239] [drm] Skip scheduling IBs!


I also get the following errors in dmesg from time to time, but they have no visible impact so far:

[ 1046.269344] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1056.509203] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1066.749132] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1076.988590] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1087.228896] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1094.983205] i2c_hid_acpi i2c-FRMW0005:00: i2c_hid_get_input: incomplete report (7/65535)
[ 1097.468792] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1107.708726] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1117.948141] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1128.188485] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1138.428402] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1148.668306] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1158.908169] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1169.147619] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
[ 1179.387933] [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!


Thank you, hopefully this info is useful to someone!

Simon Heath


-- System Information:
Debian Release: trixie/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.5.0-1-amd64 (SMP w/12 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libdrm-amdgpu1 depends on:
ii  libc6    2.37-12
ii  libdrm2  2.4.115-1

libdrm-amdgpu1 recommends no packages.

libdrm-amdgpu1 suggests no packages.

-- no debconf information


Reply to: