[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#990279: 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo map") breaks amdgpu on ppc64 machines?



It looks like the missing patch made its way into
5.15.0-0.bpo.2-powerpc64le, as f4d3da72a76a9ce... I think.  As a
result, I think this bug is overcome by events and can be closed.

On Tue, Nov 2, 2021 at 12:37 PM Christian König
<christian.koenig@amd.com> wrote:
>
>
>
> Am 30.10.21 um 16:32 schrieb Salvatore Bonaccorso:
> > On Wed, Oct 13, 2021 at 10:08:21PM +0200, Salvatore Bonaccorso wrote:
> >> Hi,
> >>
> >> On Mon, Oct 11, 2021 at 10:30:21AM +0200, Christian König wrote:
> >>> Am 10.10.21 um 16:14 schrieb Xi Ruoyao:
> >>>> On Sun, 2021-10-10 at 14:46 +0100, Nathaniel Filardo wrote:
> >>>>> It occurs to me, quite belatedly, that it may be worth asking the
> >>>>> author, reviewers, and signers of the change in question their
> >>>>> thoughts on this bug report:
> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.debian.org%2Fcgi-bin%2Fbugreport.cgi%3Fbug%3D990279&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C3eab702e82bc4ab81fbd08d99bb21419%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637712012456263348%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=KFxF9He7545uxYylgQ%2F4HdljVuGoGUbvHh9xohbxu4o%3D&amp;reserved=0
> >>>>>
> >>>>> In particular, on ppc64 systems, Linux typically is configured to use
> >>>>> a 64KiB page (i.e., shift 16) rather than 4KiB (shift 12) page.  It
> >>>>> looks, however, that AMDGPU_GPU_PAGE_SIZE is always 4096, and so
> >>>>> something (perhaps in userspace, even, eek?) is requesting
> >>>>> 4KiB-but-not-64KiB alignment of this buffer.
> >>>> Christian told me the buffer should be aligned to *CPU* page boundary,
> >>>> or the page table in AMDGPU driver will be corrupted:
> >>> Yeah, that's indeed correct. And that intentionally breaks because otherwise
> >>> we can corrupt the page tables and potentially cause much worse trouble.
> >>>
> >>> Question is more why userspace isn't told the correct value in your branch.
> >>>
> >>>>> the value of num_entries must always be a multiple of
> >>>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables.
> >>>>> You need to identify the root cause of this, most likely start or last
> >>>>> are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE.
> >>>> IMO f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable
> >>>> dev_info.gart_page_size" should be backported along with this, which
> >>>> makes the kernel to provide the CPU page size to libdrm and mesa and
> >>>> correct userspace behavior.  I'm not sure why only one is backported.
> >>>
> >>> Yes, exactly that sounds like the correct fix to me as well.
> >> So, the 9a89a721b41b (" drm/amdgpu: check alignment on CPU page for bo
> >> map") was backported to several stable series 4.14.229, 4.19.185,
> >> 5.4.110, 5.10.28 and 5.11.12 but not the
> >> f4d3da72a76a9ce5f57bba64788931686a9dc333 "drm/amdgpu: Set a suitable
> >> dev_info.gart_page_size".
> >>
> >> What is confusely is that all of those backports reference as upstream
> >> commit e3512fb67093fabdf27af303066627b921ee9bd8 and not
> >> 9a89a721b41b23c6da8f8a6dd0e382966a850dcf which might be in part source
> >> of the confusion?
> >>
> >> Can any of you request to backport
> >> f4d3da72a76a9ce5f57bba64788931686a9dc333 as well for those stable
> >> series where relevant?
> > Here is the proposed change. Should this be submitted to stable for
> > 5.10.y?
>
> You can drop the max_t(), just using PAGE_SIZE should work since there
> can't be any smaller page size than 4k or the driver won't work at all.
>
> Regards,
> Christian.
>
> >
> > Regards,
> > Salvatore
> >
> >  From 02c987eb2ab0cdfd536d08bf812f4e37d3cc150a Mon Sep 17 00:00:00 2001
> > From: Huacai Chen <chenhc@lemote.com>
> > Date: Tue, 30 Mar 2021 23:33:33 +0800
> > Subject: [PATCH] drm/amdgpu: Set a suitable dev_info.gart_page_size
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > commit f4d3da72a76a9ce5f57bba64788931686a9dc333 upstream.
> >
> > In Mesa, dev_info.gart_page_size is used for alignment and it was
> > set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
> > driver requires an alignment on CPU pages.  So, for non-4KB page system,
> > gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).
> >
> > Signed-off-by: Rui Wang <wangr@lemote.com>
> > Signed-off-by: Huacai Chen <chenhc@lemote.com>
> > Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Floongson-community%2Flinux-stable%2Fcommit%2Fcaa9c0a1&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C3eab702e82bc4ab81fbd08d99bb21419%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637712012456273340%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=C8Dz22MXtb6u012FMN%2BRm0i9vDNkn5LJrkqX2qACzqY%3D&amp;reserved=0
> > [Xi: rebased for drm-next, use max_t for checkpatch,
> >       and reworded commit message.]
> > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
> > BugLink: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1549&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C3eab702e82bc4ab81fbd08d99bb21419%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637712012456273340%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Azbz96fafUgilLJIFAOzXadKygHFVqYBxlh%2FYvhNtdE%3D&amp;reserved=0
> > Tested-by: Dan Horák <dan@danny.cz>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > [Salvatore Bonaccorso: Backport to 5.10.y which does not contain
> > a5a52a43eac0 ("drm/amd/amdgpu/amdgpu_kms: Remove 'struct
> > drm_amdgpu_info_device dev_info' from the stack") which removes dev_info
> > from the stack and places it on the heap.]
> > Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > index efda38349a03..917b94002f4b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > @@ -766,9 +766,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
> >                       dev_info.high_va_offset = AMDGPU_GMC_HOLE_END;
> >                       dev_info.high_va_max = AMDGPU_GMC_HOLE_END | vm_size;
> >               }
> > -             dev_info.virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
> > +             dev_info.virtual_address_alignment = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
> >               dev_info.pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE;
> > -             dev_info.gart_page_size = AMDGPU_GPU_PAGE_SIZE;
> > +             dev_info.gart_page_size = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
> >               dev_info.cu_active_number = adev->gfx.cu_info.number;
> >               dev_info.cu_ao_mask = adev->gfx.cu_info.ao_cu_mask;
> >               dev_info.ce_ram_size = adev->gfx.ce_ram_size;
>


Reply to: