[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#990279: 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo map") breaks amdgpu on ppc64 machines?



On Wed, Oct 13, 2021 at 10:08:21PM +0200, Salvatore Bonaccorso wrote:
> Hi,
> 
> On Mon, Oct 11, 2021 at 10:30:21AM +0200, Christian König wrote:
> > Am 10.10.21 um 16:14 schrieb Xi Ruoyao:
> > > On Sun, 2021-10-10 at 14:46 +0100, Nathaniel Filardo wrote:
> > > > It occurs to me, quite belatedly, that it may be worth asking the
> > > > author, reviewers, and signers of the change in question their
> > > > thoughts on this bug report:
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.debian.org%2Fcgi-bin%2Fbugreport.cgi%3Fbug%3D990279&data=04%7C01%7Cchristian.koenig%40amd.com%7C915628061dd746062c5408d98bf84df9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637694721282436279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=V4R4BPCHNQzx2bF6STDzfjW%2BQezZg89w8%2FEeRpuRVnM%3D&reserved=0
> > > > 
> > > > In particular, on ppc64 systems, Linux typically is configured to use
> > > > a 64KiB page (i.e., shift 16) rather than 4KiB (shift 12) page.  It
> > > > looks, however, that AMDGPU_GPU_PAGE_SIZE is always 4096, and so
> > > > something (perhaps in userspace, even, eek?) is requesting
> > > > 4KiB-but-not-64KiB alignment of this buffer.
> > > Christian told me the buffer should be aligned to *CPU* page boundary,
> > > or the page table in AMDGPU driver will be corrupted:
> > 
> > Yeah, that's indeed correct. And that intentionally breaks because otherwise
> > we can corrupt the page tables and potentially cause much worse trouble.
> > 
> > Question is more why userspace isn't told the correct value in your branch.
> > 
> > > 
> > > > the value of num_entries must always be a multiple of
> > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables.
> > > > You need to identify the root cause of this, most likely start or last
> > > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE.
> > > IMO f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable
> > > dev_info.gart_page_size" should be backported along with this, which
> > > makes the kernel to provide the CPU page size to libdrm and mesa and
> > > correct userspace behavior.  I'm not sure why only one is backported.
> > 
> > 
> > Yes, exactly that sounds like the correct fix to me as well.
> 
> So, the 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo
> map") was backported to several stable series 4.14.229, 4.19.185,
> 5.4.110, 5.10.28 and 5.11.12 but not the
> f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable
> dev_info.gart_page_size".
> 
> What is confusely is that all of those backports reference as upstream
> commit e3512fb67093fabdf27af303066627b921ee9bd8 and not
> 9a89a721b41b23c6da8f8a6dd0e382966a850dcf which might be in part source
> of the confusion?
> 
> Can any of you request to backport
> f4d3da72a76a9ce5f57bba64788931686a9dc333 as well for those stable
> series where relevant?

Here is the proposed change. Should this be submitted to stable for
5.10.y?

Regards,
Salvatore

>From 02c987eb2ab0cdfd536d08bf812f4e37d3cc150a Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhc@lemote.com>
Date: Tue, 30 Mar 2021 23:33:33 +0800
Subject: [PATCH] drm/amdgpu: Set a suitable dev_info.gart_page_size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit f4d3da72a76a9ce5f57bba64788931686a9dc333 upstream.

In Mesa, dev_info.gart_page_size is used for alignment and it was
set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
driver requires an alignment on CPU pages.  So, for non-4KB page system,
gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).

Signed-off-by: Rui Wang <wangr@lemote.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1
[Xi: rebased for drm-next, use max_t for checkpatch,
     and reworded commit message.]
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Tested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Salvatore Bonaccorso: Backport to 5.10.y which does not contain
a5a52a43eac0 ("drm/amd/amdgpu/amdgpu_kms: Remove 'struct
drm_amdgpu_info_device dev_info' from the stack") which removes dev_info
from the stack and places it on the heap.]
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index efda38349a03..917b94002f4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -766,9 +766,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
 			dev_info.high_va_offset = AMDGPU_GMC_HOLE_END;
 			dev_info.high_va_max = AMDGPU_GMC_HOLE_END | vm_size;
 		}
-		dev_info.virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
+		dev_info.virtual_address_alignment = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
 		dev_info.pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE;
-		dev_info.gart_page_size = AMDGPU_GPU_PAGE_SIZE;
+		dev_info.gart_page_size = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
 		dev_info.cu_active_number = adev->gfx.cu_info.number;
 		dev_info.cu_ao_mask = adev->gfx.cu_info.ao_cu_mask;
 		dev_info.ce_ram_size = adev->gfx.ce_ram_size;
-- 
2.33.0


Reply to: