Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
Hi Andy,
On Sun, Jul 25, 2021 at 10:01:02AM +0000, Andy Smith wrote:
> Hello!
>
> On Sat, Jul 24, 2021 at 08:01:10PM +0000, Andy Smith wrote:
> > I've been suggested two commits to try:
> >
> > - <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/dma/swiotlb.c?id=5f89468e2f060031cd89fd4287298e0eaf246bf6> as suggested in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026639.html>
> >
> > - <https://lore.kernel.org/patchwork/patch/1442338/> as suggested
> > in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026740.html>
> >
> > The first one I can't work out how to apply to
> > linux-image-amd64/buster-backports because it's for 5.13 and too
> > many other changes happened.
>
> I see now that the first one went in 5.10.46:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/queue-5.10/swiotlb-manipulate-orig_addr-when-tlb_addr-has-offset.patch?id=fc7b255bfae6e62091006146ca685a25ec6f69c6
>
> > The second looks like I could test it quite easily. It only got into
> > upstream at 5.14-rc1 though.
>
> I went ahead and tried out the second patch since it was easiest and
> it seems to have fixed my issue!
>
> To recap, I was able to reproduce the issue within seconds by
> running this fio job inside a Xen guest:
>
> fio --name=randread \
> --filename=/srv/fio/test \
> --size=35g \
> --numjobs=1 \
> --rw=randread \
> --direct=1 \
> --ioengine=libaio \
> --iodepth=16 \
> --blocksize_range=4k-4m \
> --blocksize_unaligned=0 \
> --gtod_reduce=1 \
> --iodepth=64 \
> --time_based \
> --runtime=4h
>
> Where /srv/fio/ is an ext4 filesystem that is the first partition on
> a block device, intentionally misaligned by starting it 63 sectors
> in to the device instead of the more modern practice of 2048 or
> whatever.
>
> I see this patch made it in to 5.10.50:
>
> https://lore.kernel.org/patchwork/patch/1442338/
>
> It's not present in buster-backports nor in bullseye, as far as I
> can see. What is the process to get it in to bullseye's kernel and
> can it be backported for buster? It seems like quite a serious issue
> as NVMe is not exotic and this seems like it can cause broken
> filesystems and RAIDs.
Thanks for your extensive investigation. I will cherry-pick the patch
for the next bullseye upload (and so will go into buster-backports as
well once possible).
Regards,
Salvatore
Reply to: