[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13



Hi Andy,

On Sun, Jul 25, 2021 at 10:01:02AM +0000, Andy Smith wrote:
> Hello!
> 
> On Sat, Jul 24, 2021 at 08:01:10PM +0000, Andy Smith wrote:
> > I've been suggested two commits to try:
> > 
> >  - <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/dma/swiotlb.c?id=5f89468e2f060031cd89fd4287298e0eaf246bf6> as suggested in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026639.html>
> > 
> >  - <https://lore.kernel.org/patchwork/patch/1442338/> as suggested
> >    in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026740.html>
> > 
> > The first one I can't work out how to apply to
> > linux-image-amd64/buster-backports because it's for 5.13 and too
> > many other changes happened.
> 
> I see now that the first one went in 5.10.46:
> 
>     https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/queue-5.10/swiotlb-manipulate-orig_addr-when-tlb_addr-has-offset.patch?id=fc7b255bfae6e62091006146ca685a25ec6f69c6
> 
> > The second looks like I could test it quite easily. It only got into
> > upstream at 5.14-rc1 though.
> 
> I went ahead and tried out the second patch since it was easiest and
> it seems to have fixed my issue!
> 
> To recap, I was able to reproduce the issue within seconds by
> running this fio job inside a Xen guest:
> 
> fio --name=randread \
>     --filename=/srv/fio/test \
>     --size=35g \
>     --numjobs=1 \
>     --rw=randread \
>     --direct=1 \
>     --ioengine=libaio \
>     --iodepth=16 \
>     --blocksize_range=4k-4m \
>     --blocksize_unaligned=0 \
>     --gtod_reduce=1 \
>     --iodepth=64 \
>     --time_based \
>     --runtime=4h
> 
> Where /srv/fio/ is an ext4 filesystem that is the first partition on
> a block device, intentionally misaligned by starting it 63 sectors
> in to the device instead of the more modern practice of 2048 or
> whatever.
> 
> I see this patch made it in to 5.10.50:
> 
>     https://lore.kernel.org/patchwork/patch/1442338/
> 
> It's not present in buster-backports nor in bullseye, as far as I
> can see. What is the process to get it in to bullseye's kernel and
> can it be backported for buster? It seems like quite a serious issue
> as NVMe is not exotic and this seems like it can cause broken
> filesystems and RAIDs.

Thanks for your extensive investigation. I will cherry-pick the patch
for the next bullseye upload (and so will go into buster-backports as
well once possible). 

Regards,
Salvatore


Reply to: