[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#955407: linux-image-4.19.0-8-amd64: "Uhhuh. NMI received for unknown reason" on AMD Ryzen



Dear Maintainer,
I observed such logging, too. My system is similar to the submitters one.
Found two occourences in still available kern.log* files. (See attached file.)

One was most probably related to a "GPU fault" 25 seconds before,
running 4.19.0-8-amd64/4.19.98-1.

The other was while not being at the idling system,
running 5.4.0-0.bpo.3-amd64/5.4.13-1~bpo10+1.
No negative consequence found at that time.

Kind regards,
Bernhard
# LANG=C lscpu
...
CPU family:          23
Model:               1
Model name:          AMD Ryzen 7 1700 Eight-Core Processor
Stepping:            1
...


# (zcat kern.log.4.gz kern.log.3.gz kern.log.2.gz; cat kern.log.1 kern.log) | grep -E "NMI received|Linux version" -A3

Mar  7 00:10:29 rechner kernel: [    0.000000] Linux version 4.19.0-8-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.98-1 (2020-01-26)
...
Mar  7 00:10:29 rechner kernel: [    0.000000] DMI: System manufacturer System Product Name/PRIME B350M-A, BIOS 4801 04/25/2019
...
Mar  7 00:13:49 rechner kernel: [  205.788363] amdgpu 0000:08:00.0: GPU fault detected: 147 0x0c304401 for process SOTTR.exe pid 3426 thread SOTTR.exe:cs0 pid 3448
Mar  7 00:13:49 rechner kernel: [  205.788370] amdgpu 0000:08:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0FF22786
Mar  7 00:13:49 rechner kernel: [  205.788373] amdgpu 0000:08:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E044001
Mar  7 00:13:49 rechner kernel: [  205.788378] amdgpu 0000:08:00.0: VM fault (0x01, vmid 7, pasid 32782) at page 267528070, read from 'TC1' (0x54433100) (68)
...
Mar  7 00:13:49 rechner kernel: [  205.788463] amdgpu 0000:08:00.0: VM fault (0x01, vmid 7, pasid 32782) at page 142747517, read from 'TC1' (0x54433100) (68)
Mar  7 00:13:59 rechner kernel: [  215.998608] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=25858, emitted seq=25860
Mar  7 00:13:59 rechner kernel: [  215.998615] [drm] GPU recovery disabled.
Mar  7 00:14:14 rechner kernel: [  230.580910] Uhhuh. NMI received for unknown reason 2d on CPU 7.
Mar  7 00:14:14 rechner kernel: [  230.580911] Do you have a strange power saving mode enabled?
Mar  7 00:14:14 rechner kernel: [  230.580914] Dazed and confused, but trying to continue
Mar  7 00:15:17 rechner kernel: [  294.139939] sysrq: SysRq : Keyboard mode set to system default
Mar  7 00:15:20 rechner kernel: [  296.891922] sysrq: SysRq : Terminate All Tasks

(attempt to test Shadow of the Tomb Raider Trial via Steam, kernel crash dump available, at least the "GPU fault" was reproducible.)

----------

Mar 26 10:00:52 rechner kernel: [    0.000000] Linux version 5.4.0-0.bpo.3-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 5.4.13-1~bpo10+1 (2020-02-07)
...
Mar 26 10:00:52 rechner kernel: [    0.000000] DMI: System manufacturer System Product Name/PRIME B350M-A, BIOS 4801 04/25/2019
...
Mar 29 07:27:08 rechner kernel: [246383.312487] Uhhuh. NMI received for unknown reason 3c on CPU 6.
Mar 29 07:27:08 rechner kernel: [246383.312488] Do you have a strange power saving mode enabled?
Mar 29 07:27:08 rechner kernel: [246383.312489] Dazed and confused, but trying to continue
Mar 29 07:35:37 rechner kernel: [246892.656398] Process accounting resumed

(system was at this time idle, no negative consequences recognized, system could be shutdown later without problems.)

Reply to: