Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13

To: Andy Smith <andy@strugglers.net>
Cc: debian-kernel@lists.debian.org
Subject: Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
From: Salvatore Bonaccorso <carnil@debian.org>
Date: Sun, 25 Jul 2021 15:28:48 +0200
Message-id: <[🔎] YP1nELHSzFtnC7zm@eldamar.lan>
Mail-followup-to: Andy Smith <andy@strugglers.net>, debian-kernel@lists.debian.org
In-reply-to: <[🔎] 20210725100102.3dlsybc6r2qiofwc@bitfolk.com>
References: <[🔎] 20210724200110.6xn4pyshinzq4dbh@bitfolk.com> <[🔎] 20210725100102.3dlsybc6r2qiofwc@bitfolk.com>

Hi Andy,

On Sun, Jul 25, 2021 at 10:01:02AM +0000, Andy Smith wrote:
> Hello!
> 
> On Sat, Jul 24, 2021 at 08:01:10PM +0000, Andy Smith wrote:
> > I've been suggested two commits to try:
> > 
> >  - <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/dma/swiotlb.c?id=5f89468e2f060031cd89fd4287298e0eaf246bf6> as suggested in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026639.html>
> > 
> >  - <https://lore.kernel.org/patchwork/patch/1442338/> as suggested
> >    in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026740.html>
> > 
> > The first one I can't work out how to apply to
> > linux-image-amd64/buster-backports because it's for 5.13 and too
> > many other changes happened.
> 
> I see now that the first one went in 5.10.46:
> 
>     https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/queue-5.10/swiotlb-manipulate-orig_addr-when-tlb_addr-has-offset.patch?id=fc7b255bfae6e62091006146ca685a25ec6f69c6
> 
> > The second looks like I could test it quite easily. It only got into
> > upstream at 5.14-rc1 though.
> 
> I went ahead and tried out the second patch since it was easiest and
> it seems to have fixed my issue!
> 
> To recap, I was able to reproduce the issue within seconds by
> running this fio job inside a Xen guest:
> 
> fio --name=randread \
>     --filename=/srv/fio/test \
>     --size=35g \
>     --numjobs=1 \
>     --rw=randread \
>     --direct=1 \
>     --ioengine=libaio \
>     --iodepth=16 \
>     --blocksize_range=4k-4m \
>     --blocksize_unaligned=0 \
>     --gtod_reduce=1 \
>     --iodepth=64 \
>     --time_based \
>     --runtime=4h
> 
> Where /srv/fio/ is an ext4 filesystem that is the first partition on
> a block device, intentionally misaligned by starting it 63 sectors
> in to the device instead of the more modern practice of 2048 or
> whatever.
> 
> I see this patch made it in to 5.10.50:
> 
>     https://lore.kernel.org/patchwork/patch/1442338/
> 
> It's not present in buster-backports nor in bullseye, as far as I
> can see. What is the process to get it in to bullseye's kernel and
> can it be backported for buster? It seems like quite a serious issue
> as NVMe is not exotic and this seems like it can cause broken
> filesystems and RAIDs.

Thanks for your extensive investigation. I will cherry-pick the patch
for the next bullseye upload (and so will go into buster-backports as
well once possible). 

Regards,
Salvatore

Reply to:

References:
- 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
  - From: Andy Smith <andy@strugglers.net>
- Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
  - From: Andy Smith <andy@strugglers.net>

Prev by Date: Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
Next by Date: Bug#990850: linux-image-5.10.0-8-amd64: mcba_usb doesn't work
Previous by thread: Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
Next by thread: Bug#991467: Vega 56's GPUTach stuck at 100% as soon as the driver is loaded
Index(es):
- Date
- Thread