Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
Hello!
On Sat, Jul 24, 2021 at 08:01:10PM +0000, Andy Smith wrote:
> I've been suggested two commits to try:
>
> - <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/dma/swiotlb.c?id=5f89468e2f060031cd89fd4287298e0eaf246bf6> as suggested in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026639.html>
>
> - <https://lore.kernel.org/patchwork/patch/1442338/> as suggested
> in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026740.html>
>
> The first one I can't work out how to apply to
> linux-image-amd64/buster-backports because it's for 5.13 and too
> many other changes happened.
I see now that the first one went in 5.10.46:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/queue-5.10/swiotlb-manipulate-orig_addr-when-tlb_addr-has-offset.patch?id=fc7b255bfae6e62091006146ca685a25ec6f69c6
> The second looks like I could test it quite easily. It only got into
> upstream at 5.14-rc1 though.
I went ahead and tried out the second patch since it was easiest and
it seems to have fixed my issue!
To recap, I was able to reproduce the issue within seconds by
running this fio job inside a Xen guest:
fio --name=randread \
--filename=/srv/fio/test \
--size=35g \
--numjobs=1 \
--rw=randread \
--direct=1 \
--ioengine=libaio \
--iodepth=16 \
--blocksize_range=4k-4m \
--blocksize_unaligned=0 \
--gtod_reduce=1 \
--iodepth=64 \
--time_based \
--runtime=4h
Where /srv/fio/ is an ext4 filesystem that is the first partition on
a block device, intentionally misaligned by starting it 63 sectors
in to the device instead of the more modern practice of 2048 or
whatever.
I see this patch made it in to 5.10.50:
https://lore.kernel.org/patchwork/patch/1442338/
It's not present in buster-backports nor in bullseye, as far as I
can see. What is the process to get it in to bullseye's kernel and
can it be backported for buster? It seems like quite a serious issue
as NVMe is not exotic and this seems like it can cause broken
filesystems and RAIDs.
I did test out the first patch as well but it had no effect that I
could see.
Thanks,
Andy
Reply to: