[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13



Hello!

On Sat, Jul 24, 2021 at 08:01:10PM +0000, Andy Smith wrote:
> I've been suggested two commits to try:
> 
>  - <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/dma/swiotlb.c?id=5f89468e2f060031cd89fd4287298e0eaf246bf6> as suggested in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026639.html>
> 
>  - <https://lore.kernel.org/patchwork/patch/1442338/> as suggested
>    in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026740.html>
> 
> The first one I can't work out how to apply to
> linux-image-amd64/buster-backports because it's for 5.13 and too
> many other changes happened.

I see now that the first one went in 5.10.46:

    https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/queue-5.10/swiotlb-manipulate-orig_addr-when-tlb_addr-has-offset.patch?id=fc7b255bfae6e62091006146ca685a25ec6f69c6

> The second looks like I could test it quite easily. It only got into
> upstream at 5.14-rc1 though.

I went ahead and tried out the second patch since it was easiest and
it seems to have fixed my issue!

To recap, I was able to reproduce the issue within seconds by
running this fio job inside a Xen guest:

fio --name=randread \
    --filename=/srv/fio/test \
    --size=35g \
    --numjobs=1 \
    --rw=randread \
    --direct=1 \
    --ioengine=libaio \
    --iodepth=16 \
    --blocksize_range=4k-4m \
    --blocksize_unaligned=0 \
    --gtod_reduce=1 \
    --iodepth=64 \
    --time_based \
    --runtime=4h

Where /srv/fio/ is an ext4 filesystem that is the first partition on
a block device, intentionally misaligned by starting it 63 sectors
in to the device instead of the more modern practice of 2048 or
whatever.

I see this patch made it in to 5.10.50:

    https://lore.kernel.org/patchwork/patch/1442338/

It's not present in buster-backports nor in bullseye, as far as I
can see. What is the process to get it in to bullseye's kernel and
can it be backported for buster? It seems like quite a serious issue
as NVMe is not exotic and this seems like it can cause broken
filesystems and RAIDs.

I did test out the first patch as well but it had no effect that I
could see.

Thanks,
Andy


Reply to: