[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#990279: 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo map") breaks amdgpu on ppc64 machines?





Am 30.10.21 um 16:32 schrieb Salvatore Bonaccorso:
On Wed, Oct 13, 2021 at 10:08:21PM +0200, Salvatore Bonaccorso wrote:
Hi,

On Mon, Oct 11, 2021 at 10:30:21AM +0200, Christian König wrote:
Am 10.10.21 um 16:14 schrieb Xi Ruoyao:
On Sun, 2021-10-10 at 14:46 +0100, Nathaniel Filardo wrote:
It occurs to me, quite belatedly, that it may be worth asking the
author, reviewers, and signers of the change in question their
thoughts on this bug report:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.debian.org%2Fcgi-bin%2Fbugreport.cgi%3Fbug%3D990279&data=04%7C01%7Cchristian.koenig%40amd.com%7C3eab702e82bc4ab81fbd08d99bb21419%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637712012456263348%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KFxF9He7545uxYylgQ%2F4HdljVuGoGUbvHh9xohbxu4o%3D&reserved=0

In particular, on ppc64 systems, Linux typically is configured to use
a 64KiB page (i.e., shift 16) rather than 4KiB (shift 12) page.  It
looks, however, that AMDGPU_GPU_PAGE_SIZE is always 4096, and so
something (perhaps in userspace, even, eek?) is requesting
4KiB-but-not-64KiB alignment of this buffer.
Christian told me the buffer should be aligned to *CPU* page boundary,
or the page table in AMDGPU driver will be corrupted:
Yeah, that's indeed correct. And that intentionally breaks because otherwise
we can corrupt the page tables and potentially cause much worse trouble.

Question is more why userspace isn't told the correct value in your branch.

the value of num_entries must always be a multiple of
AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables.
You need to identify the root cause of this, most likely start or last
are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE.
IMO f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable
dev_info.gart_page_size" should be backported along with this, which
makes the kernel to provide the CPU page size to libdrm and mesa and
correct userspace behavior.  I'm not sure why only one is backported.

Yes, exactly that sounds like the correct fix to me as well.
So, the 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo
map") was backported to several stable series 4.14.229, 4.19.185,
5.4.110, 5.10.28 and 5.11.12 but not the
f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable
dev_info.gart_page_size".

What is confusely is that all of those backports reference as upstream
commit e3512fb67093fabdf27af303066627b921ee9bd8 and not
9a89a721b41b23c6da8f8a6dd0e382966a850dcf which might be in part source
of the confusion?

Can any of you request to backport
f4d3da72a76a9ce5f57bba64788931686a9dc333 as well for those stable
series where relevant?
Here is the proposed change. Should this be submitted to stable for
5.10.y?

You can drop the max_t(), just using PAGE_SIZE should work since there can't be any smaller page size than 4k or the driver won't work at all.

Regards,
Christian.


Regards,
Salvatore

 From 02c987eb2ab0cdfd536d08bf812f4e37d3cc150a Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhc@lemote.com>
Date: Tue, 30 Mar 2021 23:33:33 +0800
Subject: [PATCH] drm/amdgpu: Set a suitable dev_info.gart_page_size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit f4d3da72a76a9ce5f57bba64788931686a9dc333 upstream.

In Mesa, dev_info.gart_page_size is used for alignment and it was
set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
driver requires an alignment on CPU pages.  So, for non-4KB page system,
gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).

Signed-off-by: Rui Wang <wangr@lemote.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Floongson-community%2Flinux-stable%2Fcommit%2Fcaa9c0a1&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C3eab702e82bc4ab81fbd08d99bb21419%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637712012456273340%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=C8Dz22MXtb6u012FMN%2BRm0i9vDNkn5LJrkqX2qACzqY%3D&amp;reserved=0
[Xi: rebased for drm-next, use max_t for checkpatch,
      and reworded commit message.]
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
BugLink: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1549&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C3eab702e82bc4ab81fbd08d99bb21419%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637712012456273340%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Azbz96fafUgilLJIFAOzXadKygHFVqYBxlh%2FYvhNtdE%3D&amp;reserved=0
Tested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Salvatore Bonaccorso: Backport to 5.10.y which does not contain
a5a52a43eac0 ("drm/amd/amdgpu/amdgpu_kms: Remove 'struct
drm_amdgpu_info_device dev_info' from the stack") which removes dev_info
from the stack and places it on the heap.]
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index efda38349a03..917b94002f4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -766,9 +766,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
  			dev_info.high_va_offset = AMDGPU_GMC_HOLE_END;
  			dev_info.high_va_max = AMDGPU_GMC_HOLE_END | vm_size;
  		}
-		dev_info.virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
+		dev_info.virtual_address_alignment = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
  		dev_info.pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE;
-		dev_info.gart_page_size = AMDGPU_GPU_PAGE_SIZE;
+		dev_info.gart_page_size = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
  		dev_info.cu_active_number = adev->gfx.cu_info.number;
  		dev_info.cu_ao_mask = adev->gfx.cu_info.ao_cu_mask;
  		dev_info.ce_ram_size = adev->gfx.ce_ram_size;


Reply to: