Bug#955407: linux-image-4.19.0-8-amd64: "Uhhuh. NMI received for unknown reason" on AMD Ryzen
Dear Maintainer,
I observed such logging, too. My system is similar to the submitters one.
Found two occourences in still available kern.log* files. (See attached file.)
One was most probably related to a "GPU fault" 25 seconds before,
running 4.19.0-8-amd64/4.19.98-1.
The other was while not being at the idling system,
running 5.4.0-0.bpo.3-amd64/5.4.13-1~bpo10+1.
No negative consequence found at that time.
Kind regards,
Bernhard
# LANG=C lscpu
...
CPU family: 23
Model: 1
Model name: AMD Ryzen 7 1700 Eight-Core Processor
Stepping: 1
...
# (zcat kern.log.4.gz kern.log.3.gz kern.log.2.gz; cat kern.log.1 kern.log) | grep -E "NMI received|Linux version" -A3
Mar 7 00:10:29 rechner kernel: [ 0.000000] Linux version 4.19.0-8-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.98-1 (2020-01-26)
...
Mar 7 00:10:29 rechner kernel: [ 0.000000] DMI: System manufacturer System Product Name/PRIME B350M-A, BIOS 4801 04/25/2019
...
Mar 7 00:13:49 rechner kernel: [ 205.788363] amdgpu 0000:08:00.0: GPU fault detected: 147 0x0c304401 for process SOTTR.exe pid 3426 thread SOTTR.exe:cs0 pid 3448
Mar 7 00:13:49 rechner kernel: [ 205.788370] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FF22786
Mar 7 00:13:49 rechner kernel: [ 205.788373] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E044001
Mar 7 00:13:49 rechner kernel: [ 205.788378] amdgpu 0000:08:00.0: VM fault (0x01, vmid 7, pasid 32782) at page 267528070, read from 'TC1' (0x54433100) (68)
...
Mar 7 00:13:49 rechner kernel: [ 205.788463] amdgpu 0000:08:00.0: VM fault (0x01, vmid 7, pasid 32782) at page 142747517, read from 'TC1' (0x54433100) (68)
Mar 7 00:13:59 rechner kernel: [ 215.998608] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=25858, emitted seq=25860
Mar 7 00:13:59 rechner kernel: [ 215.998615] [drm] GPU recovery disabled.
Mar 7 00:14:14 rechner kernel: [ 230.580910] Uhhuh. NMI received for unknown reason 2d on CPU 7.
Mar 7 00:14:14 rechner kernel: [ 230.580911] Do you have a strange power saving mode enabled?
Mar 7 00:14:14 rechner kernel: [ 230.580914] Dazed and confused, but trying to continue
Mar 7 00:15:17 rechner kernel: [ 294.139939] sysrq: SysRq : Keyboard mode set to system default
Mar 7 00:15:20 rechner kernel: [ 296.891922] sysrq: SysRq : Terminate All Tasks
(attempt to test Shadow of the Tomb Raider Trial via Steam, kernel crash dump available, at least the "GPU fault" was reproducible.)
----------
Mar 26 10:00:52 rechner kernel: [ 0.000000] Linux version 5.4.0-0.bpo.3-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 5.4.13-1~bpo10+1 (2020-02-07)
...
Mar 26 10:00:52 rechner kernel: [ 0.000000] DMI: System manufacturer System Product Name/PRIME B350M-A, BIOS 4801 04/25/2019
...
Mar 29 07:27:08 rechner kernel: [246383.312487] Uhhuh. NMI received for unknown reason 3c on CPU 6.
Mar 29 07:27:08 rechner kernel: [246383.312488] Do you have a strange power saving mode enabled?
Mar 29 07:27:08 rechner kernel: [246383.312489] Dazed and confused, but trying to continue
Mar 29 07:35:37 rechner kernel: [246892.656398] Process accounting resumed
(system was at this time idle, no negative consequences recognized, system could be shutdown later without problems.)
Reply to: