[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#990279: 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo map") breaks amdgpu on ppc64 machines?



Hi,

On Mon, Oct 11, 2021 at 10:30:21AM +0200, Christian König wrote:
> Am 10.10.21 um 16:14 schrieb Xi Ruoyao:
> > On Sun, 2021-10-10 at 14:46 +0100, Nathaniel Filardo wrote:
> > > It occurs to me, quite belatedly, that it may be worth asking the
> > > author, reviewers, and signers of the change in question their
> > > thoughts on this bug report:
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.debian.org%2Fcgi-bin%2Fbugreport.cgi%3Fbug%3D990279&data=04%7C01%7Cchristian.koenig%40amd.com%7C915628061dd746062c5408d98bf84df9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637694721282436279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=V4R4BPCHNQzx2bF6STDzfjW%2BQezZg89w8%2FEeRpuRVnM%3D&reserved=0
> > > 
> > > In particular, on ppc64 systems, Linux typically is configured to use
> > > a 64KiB page (i.e., shift 16) rather than 4KiB (shift 12) page.  It
> > > looks, however, that AMDGPU_GPU_PAGE_SIZE is always 4096, and so
> > > something (perhaps in userspace, even, eek?) is requesting
> > > 4KiB-but-not-64KiB alignment of this buffer.
> > Christian told me the buffer should be aligned to *CPU* page boundary,
> > or the page table in AMDGPU driver will be corrupted:
> 
> Yeah, that's indeed correct. And that intentionally breaks because otherwise
> we can corrupt the page tables and potentially cause much worse trouble.
> 
> Question is more why userspace isn't told the correct value in your branch.
> 
> > 
> > > the value of num_entries must always be a multiple of
> > > AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables.
> > > You need to identify the root cause of this, most likely start or last
> > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE.
> > IMO f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable
> > dev_info.gart_page_size" should be backported along with this, which
> > makes the kernel to provide the CPU page size to libdrm and mesa and
> > correct userspace behavior.  I'm not sure why only one is backported.
> 
> 
> Yes, exactly that sounds like the correct fix to me as well.

So, the 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo
map") was backported to several stable series 4.14.229, 4.19.185,
5.4.110, 5.10.28 and 5.11.12 but not the
f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable
dev_info.gart_page_size".

What is confusely is that all of those backports reference as upstream
commit e3512fb67093fabdf27af303066627b921ee9bd8 and not
9a89a721b41b23c6da8f8a6dd0e382966a850dcf which might be in part source
of the confusion?

Can any of you request to backport
f4d3da72a76a9ce5f57bba64788931686a9dc333 as well for those stable
series where relevant?

Regards,
Salvatore


Reply to: