On zaterdag 24 juli 2021 22:03:23 CEST Diederik de Haas wrote:
> > It's already backported to 5.10, just after 5.10.46 was released:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h
> > =l inux-5.10.y&id=fea853aca3210c21dfcf07bb82d501b7fd1900a7
>
> Just found out the reverted commit was introduced just before the 5.10.46
> tag was created, which should mean that any version before 5.10.46 should
> NOT have this problem.
I just found out it's 2 commits that should be reverted
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=1bd81429d53ded4e111616c755a64fad80849354
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=fea853aca3210c21dfcf07bb82d501b7fd1900a7
The first one I saved (and attached) as 'fix-bug991453-part1.patch' and the
second one as 'fix-bug991453-part1.patch'
Then I followed step 4.2.1 and 4.2.2 of the kernel handbook:
https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s-common-official
That resulted in 'linux-image-5.10.0-8-amd64-unsigned_5.10.46-2a~test_amd64.deb'
which I then installed on my system and rebooted into that.
$ uname -a
Linux bagend 5.10.0-8-amd64 #1 SMP Debian 5.10.46-2a~test (2021-07-24) x86_64 GNU/Linux
$ cat /sys/class/drm/card0/device/gpu_busy_percent
0
$ sensors
nvme-pci-0100
Adapter: PCI adapter
Composite: +40.9°C (low = -273.1°C, high = +72.8°C)
(crit = +75.8°C)
Sensor 1: +40.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +51.9°C (low = -273.1°C, high = +65261.8°C)
amdgpu-pci-0c00
Adapter: PCI adapter
vddgfx: 750.00 mV
fan1: 1208 RPM (min = 0 RPM, max = 3500 RPM)
edge: +42.0°C (crit = +85.0°C, hyst = -273.1°C)
(emerg = +90.0°C)
junction: +42.0°C (crit = +105.0°C, hyst = -273.1°C)
(emerg = +110.0°C)
mem: +43.0°C (crit = +95.0°C, hyst = -273.1°C)
(emerg = +100.0°C)
power1: 7.00 W (cap = 260.00 W)
k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +76.5°C
Tdie: +56.5°C
# radeontop
Graphics pipe 0.83%
0.17G / 0.94G Memory Clock 17.67%
0.03G / 1.63G Shader Clock 1.78%
These are the same 'scores' as I had with the 5.10.0-7-amd64 kernel.
So applying the mentioned/attached patches on top of the current kernel
as available in Debian Testing/Bullseye and Sid, fixes the problem.
In the last year I've spend considerable time to bring down my energy
usage/needs and I never expected that (reverting) 2 kernel commits
would save me 67W (continuously), so thank you very much piorunz for
bringing this to my attention.
I normally have a quiet system and noticed it often wasn't quiet lately;
I blame(d) 'baloo' (file indexing) for that, but it turns out it was mostly
my GPU running at 100% all the time.
As it looks like a lot of users with AMD GPUs are affected and the
considerable energy wasted because of it (Climate Change),
I really hope/urge that these 2 patches/reverts are applied before Bullseye
gets released.
Cheers,
Diederik
>From 1bd81429d53ded4e111616c755a64fad80849354 Mon Sep 17 00:00:00 2001
From: Yifan Zhang <yifan1.zhang@amd.com>
Date: Sat, 19 Jun 2021 11:40:54 +0800
Subject: Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG
issue."
commit ee5468b9f1d3bf48082eed351dace14598e8ca39 upstream.
This reverts commit 4cbbe34807938e6e494e535a68d5ff64edac3f20.
Reason for revert: side effect of enlarging CP_MEC_DOORBELL_RANGE may
cause some APUs fail to enter gfxoff in certain user cases.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 1859d293ef712..fb15e8b5af32f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3619,12 +3619,8 @@ static int gfx_v9_0_kiq_init_register(struct amdgpu_ring *ring)
if (ring->use_doorbell) {
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER,
(adev->doorbell_index.kiq * 2) << 2);
- /* If GC has entered CGPG, ringing doorbell > first page doesn't
- * wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround
- * this issue.
- */
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER,
- (adev->doorbell.size - 4));
+ (adev->doorbell_index.userqueue_end * 2) << 2);
}
WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL,
--
cgit 1.2.3-1.el7
>From fea853aca3210c21dfcf07bb82d501b7fd1900a7 Mon Sep 17 00:00:00 2001
From: Yifan Zhang <yifan1.zhang@amd.com>
Date: Sat, 19 Jun 2021 11:39:43 +0800
Subject: Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to
cover full doorbell."
commit baacf52a473b24e10322b67757ddb92ab8d86717 upstream.
This reverts commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3.
Reason for revert: Side effect of enlarging CP_MEC_DOORBELL_RANGE may
cause some APUs fail to enter gfxoff in certain user cases.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 3c92dacbc24ad..fc8da5fed779b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -6590,12 +6590,8 @@ static int gfx_v10_0_kiq_init_register(struct amdgpu_ring *ring)
if (ring->use_doorbell) {
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER,
(adev->doorbell_index.kiq * 2) << 2);
- /* If GC has entered CGPG, ringing doorbell > first page doesn't
- * wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround
- * this issue.
- */
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER,
- (adev->doorbell.size - 4));
+ (adev->doorbell_index.userqueue_end * 2) << 2);
}
WREG32_SOC15(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL,
--
cgit 1.2.3-1.el7
Attachment:
signature.asc
Description: This is a digitally signed message part.