[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1005005: linux-image-5.15.0-3-amd64: suspend failure with admgpu



Control: tags -1 + moreinfo

HI Dominique,

On Sat, Feb 05, 2022 at 11:33:33AM +0100, Dominique Dumont wrote:
> Package: src:linux
> Version: 5.15.15-2
> Severity: normal
> Tags: upstream
> 
> Dear Maintainer,
> 
> 
> Since upgrade to linux-image-5.15.0-3-amd6, suspending my machine no
> longer works correctly: the screen goes blank as usual, but comes back
> after 10s or so.
> 
> The most relevant kernel logs are:
> 
> [  257.531771] PM: suspend entry (s2idle)
> [  257.610570] Filesystems sync: 0.078 seconds
> [  257.610723] (NULL device *): firmware: direct-loading firmware regulatory.db
> [  257.610726] (NULL device *): firmware: direct-loading firmware regulatory.db.p7s
> [  257.610745] (NULL device *): firmware: direct-loading firmware intel/ibt-17-16-1.ddc
> [  257.610954] (NULL device *): firmware: direct-loading firmware intel/ibt-17-16-1.sfi
> [  257.610986] (NULL device *): firmware: direct-loading firmware iwlwifi-9000-pu-b0-jf-b0-46.ucode
> [  257.611211] (NULL device *): firmware: direct-loading firmware i915/kbl_dmc_ver1_04.bin
> [  257.726247] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [  257.728699] OOM killer disabled.
> [  257.728700] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [  257.730085] printk: Suspending console(s) (use no_console_suspend to debug)
> [  257.839817] amdgpu: 
>                 last message was failed ret is 65535
> [  257.839842] amdgpu: 
>                 failed to send message 261 ret is 65535 
> 
> [ ... lots of failed message ...]
> 
> [  257.840748] ------------[ cut here ]------------
> [  257.840751] WARNING: CPU: 4 PID: 58 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2014 dm_suspend+0x19e/0x1c0 [amdgpu]
> [  257.841665] Modules linked in: rfcomm xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc xfrm_user xfrm_algo nvme_fabrics typec_displayport cmac algif_hash algif_skcipher af_alg overlay bnep binfmt_misc nls_ascii nls_cp437 squashfs vfat fat loop x86_pkg_temp_thermal intel_powerclamp mei_hdcp snd_sof_pci_intel_cnl coretemp dell_rbtn intel_rapl_msr snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci kvm_intel snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof soundwire_bus btusb btrtl kvm snd_ctl_led snd_soc_skl btbcm btintel dell_laptop irqbypass iwlmvm snd_soc_hdac_hda rapl bluetooth snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_codec_realtek snd_soc_acpi_intel_match snd_soc_acpi dell_smm_hwmon snd_hda_codec_generic intel_cstate ledtrig_audio mac80211 snd_soc_core
> [  257.841816]  dell_wmi intel_uncore snd_compress dell_smbios jitterentropy_rng dcdbas snd_hda_intel sha512_ssse3 serio_raw pcspkr libarc4 snd_intel_dspcfg sha512_generic efi_pstore dell_wmi_descriptor uvcvideo snd_intel_sdw_acpi iwlwifi snd_usb_audio snd_hda_codec dell_wmi_sysman videobuf2_vmalloc videobuf2_memops firmware_attributes_class iTCO_wdt videobuf2_v4l2 intel_pmc_bxt snd_hda_core drbg iTCO_vendor_support snd_usbmidi_lib videobuf2_common intel_wmi_thunderbolt wmi_bmof ee1004 watchdog snd_hwdep ansi_cprng joydev snd_rawmidi hid_multitouch videodev cfg80211 snd_seq_device mc snd_pcm processor_thermal_device_pci_legacy processor_thermal_device snd_timer processor_thermal_rfim processor_thermal_mbox ucsi_acpi processor_thermal_rapl snd mei_me typec_ucsi intel_rapl_common ecdh_generic roles soundcore mei ecc rfkill intel_soc_dts_iosf intel_pch_thermal typec int3403_thermal evdev int340x_thermal_zone dell_smo8800 intel_hid int3400_thermal intel_pmc_core acpi_thermal_rel acpi_pad
> [  257.841935]  sparse_keymap ac parport_pc ppdev sunrpc lp parport fuse configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_crypt dm_mod hid_jabra usbhid r8152 mii hid_generic amdgpu i915 rtsx_pci_sdmmc mmc_core crc32_pclmul crc32c_intel ghash_clmulni_intel gpu_sched nvme aesni_intel e1000e crypto_simd cryptd nvme_core i2c_algo_bit drm_ttm_helper ptp t10_pi ttm psmouse pps_core xhci_pci i2c_i801 drm_kms_helper thunderbolt crc_t10dif xhci_hcd cec i2c_smbus rc_core crct10dif_generic crct10dif_pclmul crct10dif_common rtsx_pci drm usbcore i2c_hid_acpi intel_lpss_pci i2c_hid intel_lpss idma64 usb_common hid wmi battery button video
> [  257.842049] CPU: 4 PID: 58 Comm: kworker/u16:7 Not tainted 5.15.0-3-amd64 #1  Debian 5.15.15-2
> [  257.842057] Hardware name: Dell Inc. Precision 3540/0M14W7, BIOS 1.9.1 07/06/2020
> [  257.842062] Workqueue: events_unbound async_run_entry_fn
> [  257.842075] RIP: 0010:dm_suspend+0x19e/0x1c0 [amdgpu]
> [  257.842795] Code: ff 31 d2 4c 89 e6 4c 89 ef e8 4e d7 15 00 83 f8 01 74 1e 89 c2 48 c7 c6 40 36 f5 c0 48 c7 c7 50 bc 01 c1 e8 14 89 61 ff eb c2 <0f> 0b e9 95 fe ff ff 4c 89 e6 4c 89 ef e8 60 26 15 00 eb ae e8 d9
> [  257.842801] RSP: 0018:ffffac778029fcf0 EFLAGS: 00010286
> [  257.842808] RAX: 0000000000000000 RBX: ffff9e72cb1b5b08 RCX: 0000000000000027
> [  257.842812] RDX: 0000000000000009 RSI: 0000000000000001 RDI: ffff9e72cb1a0000
> [  257.842816] RBP: ffff9e72cb1a0000 R08: 0000000000000032 R09: 0000000000000004
> [  257.842819] R10: 000000000000000f R11: ffffffffb1b82693 R12: ffff9e72cb1a0000
> [  257.842823] R13: 0000000000000004 R14: 0000000000000002 R15: ffff9e72c0145f05
> [  257.842826] FS:  0000000000000000(0000) GS:ffff9e762e500000(0000) knlGS:0000000000000000
> [  257.842831] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  257.842835] CR2: 000055620fb334f6 CR3: 0000000412e10003 CR4: 00000000003706e0
> [  257.842840] Call Trace:
> [  257.842846]  <TASK>
> [  257.842851]  ? vi_common_set_clockgating_state+0x229/0x2f0 [amdgpu]
> [  257.843356]  amdgpu_device_ip_suspend_phase1+0x5e/0xc0 [amdgpu]
> [  257.843771]  amdgpu_device_suspend+0x62/0xc0 [amdgpu]
> [  257.844184]  amdgpu_pmops_suspend+0x36/0x70 [amdgpu]
> [  257.844631]  pci_pm_suspend+0x71/0x160
> [  257.844643]  ? pci_pm_freeze+0xb0/0xb0
> [  257.844651]  dpm_run_callback+0x47/0x120
> [  257.844658]  __device_suspend+0x10e/0x470
> [  257.844664]  async_suspend+0x1b/0x90
> [  257.844669]  async_run_entry_fn+0x2d/0x130
> [  257.844677]  process_one_work+0x1ee/0x390
> [  257.844685]  worker_thread+0x53/0x3e0
> [  257.844690]  ? process_one_work+0x390/0x390
> [  257.844696]  kthread+0x124/0x150
> [  257.844706]  ? set_kthread_struct+0x40/0x40
> [  257.844715]  ret_from_fork+0x1f/0x30
> [  257.844728]  </TASK>
> [  257.844730] ---[ end trace f4b6157e346cd3f6 ]---
> [  258.419015] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vce_v3_0> failed -110
> [  258.878568] amdgpu: 
>                 last message was failed ret is 65535
> 
> [ ... lots of failed message ...]
> 
> [  259.957788] amdgpu: Failed to force to switch arbf0!
> [  259.957789] amdgpu: [disable_dpm_tasks] Failed to disable DPM!
> [  259.957789] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -22
> [  261.029543] amdgpu 0000:3b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
> [  261.029632] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
> [  263.171945] amdgpu: cp is busy, skip halt cp
> [  264.242959] amdgpu: rlc is busy, skip halt rlc
> 
> [ ... another kernel warning ... ]
> 
> [  265.315820] amdgpu 0000:3b:00.0: amdgpu: PCI CONFIG reset
> [  266.386163] PM: pci_pm_suspend(): amdgpu_pmops_suspend+0x0/0x70 [amdgpu] returns -22
> [  266.386248] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -22
> [  266.386253] amdgpu 0000:3b:00.0: PM: failed to suspend async: error -22
> [  266.386382] PM: Some devices failed to suspend, or early wake event detected
> [  266.681752] r8152 4-1.3:1.0 enx00e04c680aef: carrier on
> [  267.069698] OOM killer enabled.
> [  267.069700] Restarting tasks ... 
> 
>  
> Not that suspend works fine when booting linux-image-5.15.0-2-amd6.

Does the issue persist if you upgrade to the most recent 5.16.y
version? 5.16.4-1~exp1 (5.16.7-1 should land soon as well). Any chance
you can bisect the commit introducing the issue?

Regards,
Salvatore


Reply to: