On zaterdag 24 juli 2021 22:03:23 CEST Diederik de Haas wrote: > > It's already backported to 5.10, just after 5.10.46 was released: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h > > =l inux-5.10.y&id=fea853aca3210c21dfcf07bb82d501b7fd1900a7 > > Just found out the reverted commit was introduced just before the 5.10.46 > tag was created, which should mean that any version before 5.10.46 should > NOT have this problem. I just found out it's 2 commits that should be reverted https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=1bd81429d53ded4e111616c755a64fad80849354 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=fea853aca3210c21dfcf07bb82d501b7fd1900a7 The first one I saved (and attached) as 'fix-bug991453-part1.patch' and the second one as 'fix-bug991453-part1.patch' Then I followed step 4.2.1 and 4.2.2 of the kernel handbook: https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s-common-official That resulted in 'linux-image-5.10.0-8-amd64-unsigned_5.10.46-2a~test_amd64.deb' which I then installed on my system and rebooted into that. $ uname -a Linux bagend 5.10.0-8-amd64 #1 SMP Debian 5.10.46-2a~test (2021-07-24) x86_64 GNU/Linux $ cat /sys/class/drm/card0/device/gpu_busy_percent 0 $ sensors nvme-pci-0100 Adapter: PCI adapter Composite: +40.9°C (low = -273.1°C, high = +72.8°C) (crit = +75.8°C) Sensor 1: +40.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +51.9°C (low = -273.1°C, high = +65261.8°C) amdgpu-pci-0c00 Adapter: PCI adapter vddgfx: 750.00 mV fan1: 1208 RPM (min = 0 RPM, max = 3500 RPM) edge: +42.0°C (crit = +85.0°C, hyst = -273.1°C) (emerg = +90.0°C) junction: +42.0°C (crit = +105.0°C, hyst = -273.1°C) (emerg = +110.0°C) mem: +43.0°C (crit = +95.0°C, hyst = -273.1°C) (emerg = +100.0°C) power1: 7.00 W (cap = 260.00 W) k10temp-pci-00c3 Adapter: PCI adapter Tctl: +76.5°C Tdie: +56.5°C # radeontop Graphics pipe 0.83% 0.17G / 0.94G Memory Clock 17.67% 0.03G / 1.63G Shader Clock 1.78% These are the same 'scores' as I had with the 5.10.0-7-amd64 kernel. So applying the mentioned/attached patches on top of the current kernel as available in Debian Testing/Bullseye and Sid, fixes the problem. In the last year I've spend considerable time to bring down my energy usage/needs and I never expected that (reverting) 2 kernel commits would save me 67W (continuously), so thank you very much piorunz for bringing this to my attention. I normally have a quiet system and noticed it often wasn't quiet lately; I blame(d) 'baloo' (file indexing) for that, but it turns out it was mostly my GPU running at 100% all the time. As it looks like a lot of users with AMD GPUs are affected and the considerable energy wasted because of it (Climate Change), I really hope/urge that these 2 patches/reverts are applied before Bullseye gets released. Cheers, Diederik
>From 1bd81429d53ded4e111616c755a64fad80849354 Mon Sep 17 00:00:00 2001 From: Yifan Zhang <yifan1.zhang@amd.com> Date: Sat, 19 Jun 2021 11:40:54 +0800 Subject: Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue." commit ee5468b9f1d3bf48082eed351dace14598e8ca39 upstream. This reverts commit 4cbbe34807938e6e494e535a68d5ff64edac3f20. Reason for revert: side effect of enlarging CP_MEC_DOORBELL_RANGE may cause some APUs fail to enter gfxoff in certain user cases. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 1859d293ef712..fb15e8b5af32f 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -3619,12 +3619,8 @@ static int gfx_v9_0_kiq_init_register(struct amdgpu_ring *ring) if (ring->use_doorbell) { WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER, (adev->doorbell_index.kiq * 2) << 2); - /* If GC has entered CGPG, ringing doorbell > first page doesn't - * wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround - * this issue. - */ WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER, - (adev->doorbell.size - 4)); + (adev->doorbell_index.userqueue_end * 2) << 2); } WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL, -- cgit 1.2.3-1.el7
>From fea853aca3210c21dfcf07bb82d501b7fd1900a7 Mon Sep 17 00:00:00 2001 From: Yifan Zhang <yifan1.zhang@amd.com> Date: Sat, 19 Jun 2021 11:39:43 +0800 Subject: Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell." commit baacf52a473b24e10322b67757ddb92ab8d86717 upstream. This reverts commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3. Reason for revert: Side effect of enlarging CP_MEC_DOORBELL_RANGE may cause some APUs fail to enter gfxoff in certain user cases. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 3c92dacbc24ad..fc8da5fed779b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -6590,12 +6590,8 @@ static int gfx_v10_0_kiq_init_register(struct amdgpu_ring *ring) if (ring->use_doorbell) { WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER, (adev->doorbell_index.kiq * 2) << 2); - /* If GC has entered CGPG, ringing doorbell > first page doesn't - * wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround - * this issue. - */ WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER, - (adev->doorbell.size - 4)); + (adev->doorbell_index.userqueue_end * 2) << 2); } WREG32_SOC15(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL, -- cgit 1.2.3-1.el7
Attachment:
signature.asc
Description: This is a digitally signed message part.