5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13

To: debian-kernel@lists.debian.org
Subject: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
From: Andy Smith <andy@strugglers.net>
Date: Sat, 24 Jul 2021 20:01:10 +0000
Message-id: <[🔎] 20210724200110.6xn4pyshinzq4dbh@bitfolk.com>
Hi Debian Kernel team,

I upgraded a buster system to the backports kernel and within a
couple of hours experienced this bug:

Jul 20 02:17:54 lamb kernel: [21061.388607] sg[0] phys_addr:0x00000015eb803000 offset:0 length:4096 dma_address:0x000000209e7b7000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.389775] sg[1] phys_addr:0x00000015eb7bc000 offset:0 length:4096 dma_address:0x000000209e7b8000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.390874] sg[2] phys_addr:0x00000015eb809000 offset:0 length:4096 dma_address:0x000000209e7b9000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.391974] sg[3] phys_addr:0x00000015eb766000 offset:0 length:4096 dma_address:0x000000209e7ba000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.393042] sg[4] phys_addr:0x00000015eb7a3000 offset:0 length:4096 dma_address:0x000000209e7bb000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.394086] sg[5] phys_addr:0x00000015eb7c6000 offset:0 length:4096 dma_address:0x000000209e7bc000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.395078] sg[6] phys_addr:0x00000015eb7c2000 offset:0 length:4096 dma_address:0x000000209e7bd000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.396042] sg[7] phys_addr:0x00000015eb7a9000 offset:0 length:4096 dma_address:0x000000209e7be000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.397004] sg[8] phys_addr:0x00000015eb775000 offset:0 length:4096 dma_address:0x000000209e7bf000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.397971] sg[9] phys_addr:0x00000015eb7c7000 offset:0 length:4096 dma_address:0x00000020ff520000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.398889] sg[10] phys_addr:0x00000015eb7cb000 offset:0 length:4096 dma_address:0x00000020ff521000 dma_length:4096
Jul 20 02:17:54 lamb kernel: [21061.399814] sg[11] phys_addr:0x00000015eb7e3000 offset:0 length:61952 dma_address:0x00000020ff522000 dma_length:61952
Jul 20 02:17:54 lamb kernel: [21061.400754] sg[12] phys_addr:0x00000015eb7f2200 offset:512 length:24064 dma_address:0x00000020ff531200 dma_length:24064
Jul 20 02:17:54 lamb kernel: [21061.401781] ------------[ cut here ]------------
Jul 20 02:17:54 lamb kernel: [21061.402738] Invalid SGL for payload:131072 nents:13
Jul 20 02:17:54 lamb kernel: [21061.403724] WARNING: CPU: 1 PID: 12669 at drivers/nvme/host/pci.c:716 nvme_map_data+0x7e0/0x820 [nvme]
Jul 20 02:17:54 lamb kernel: [21061.404728] Modules linked in: binfmt_misc ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpmss nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit nfnetlink_log nfnetlink xt_NFLOG xt_multiport xt_tcpudp ip6table_filter ip6_tables iptable_filter bonding btrfs blake2b_generic dm_snapshot dm_bufio intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm intel_powerclamp crc32_pclmul ghash_clmulni_intel ipmi_ssif aesni_intel libaes crypto_simd cryptd glue_helper snd_hda_intel snd_intel_dspcfg mei_wdt soundwire_intel soundwire_generic_allocation nvme wdat_wdt snd_soc_core ast snd_compress watchdog drm_vram_helper drm_ttm_helper soundwire_cadence pcspkr nvme_core ttm snd_hda_codec drm_kms_helper snd_hda_core i2c_i801 snd_hwdep i2c_smbus cec soundwire_bus snd_pcm drm snd_timer snd soundcore igb ptp pps_core i2c_algo_bit joydev mei_me sg mei intel_lpss_pci intel_lpss idma64 acpi_ipmi ipmi_si ipmi_devintf ioatdma dca wmi ipmi_msghandler button dm_mod xenfs xen_acpi_processor
Jul 20 02:17:54 lamb kernel: [21061.404831]  xen_privcmd xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx evdev hid_generic usbhid hid raid6_pq raid0 multipath linear raid10 raid1 md_mod sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common xhci_pci ahci libahci crc32c_intel xhci_hcd libata usbcore scsi_mod usb_common
Jul 20 02:17:54 lamb kernel: [21061.417998] CPU: 1 PID: 12669 Comm: 62.xvda-0 Not tainted 5.10.0-0.bpo.7-amd64 #1 Debian 5.10.40-1~bpo10+1
Jul 20 02:17:54 lamb kernel: [21061.418459] Hardware name: Supermicro Super Server/X11SRM-VF, BIOS 1.2a 02/18/2019
Jul 20 02:17:54 lamb kernel: [21061.418922] RIP: e030:nvme_map_data+0x7e0/0x820 [nvme]
Jul 20 02:17:54 lamb kernel: [21061.419354] Code: d0 7b c0 48 c7 c7 40 d6 7b c0 e8 5b 44 c9 c0 8b 93 4c 01 00 00 f6 43 1e 04 75 36 8b 73 28 48 c7 c7 20 9c 7b c0 e8 8b 71 09 c1 <0f> 0b 41 bd 0a 00 00 00 e9 f7 fe ff ff 48 8d bd 68 02 00 00 48 89
Jul 20 02:17:54 lamb kernel: [21061.420271] RSP: e02b:ffffc90044797930 EFLAGS: 00010286
Jul 20 02:17:54 lamb kernel: [21061.420727] RAX: 0000000000000000 RBX: ffff888157db4200 RCX: 0000000000000027
Jul 20 02:17:54 lamb kernel: [21061.421186] RDX: 0000000000000027 RSI: ffff888292858a00 RDI: ffff888292858a08
Jul 20 02:17:54 lamb kernel: [21061.421639] RBP: ffff888103243000 R08: 0000000000000000 R09: c00000010000118b
Jul 20 02:17:54 lamb kernel: [21061.422090] R10: 0000000000165920 R11: ffffc90044797738 R12: ffffffffc07b9bd0
Jul 20 02:17:54 lamb kernel: [21061.422583] R13: 000000000000000d R14: 0000000000000000 R15: 000000000000000d
Jul 20 02:17:54 lamb kernel: [21061.423052] FS:  0000000000000000(0000) GS:ffff888292840000(0000) knlGS:0000000000000000
Jul 20 02:17:54 lamb kernel: [21061.423518] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 02:17:54 lamb kernel: [21061.423986] CR2: 00007f909a037c30 CR3: 000000010d2dc000 CR4: 0000000000050660
Jul 20 02:17:54 lamb kernel: [21061.424472] Call Trace:
Jul 20 02:17:54 lamb kernel: [21061.424943]  nvme_queue_rq+0x98/0x190 [nvme]
Jul 20 02:17:54 lamb kernel: [21061.425425]  blk_mq_dispatch_rq_list+0x123/0x7d0
Jul 20 02:17:54 lamb kernel: [21061.425904]  ? sbitmap_get+0x66/0x140
Jul 20 02:17:54 lamb kernel: [21061.426385]  ? elv_rb_del+0x1f/0x30
Jul 20 02:17:54 lamb kernel: [21061.426909]  ? deadline_remove_request+0x55/0xc0
Jul 20 02:17:54 lamb kernel: [21061.427373]  __blk_mq_do_dispatch_sched+0x164/0x2d0
Jul 20 02:17:54 lamb kernel: [21061.427843]  __blk_mq_sched_dispatch_requests+0x135/0x170
Jul 20 02:17:54 lamb kernel: [21061.428310]  blk_mq_sched_dispatch_requests+0x30/0x60
Jul 20 02:17:54 lamb kernel: [21061.428795]  __blk_mq_run_hw_queue+0x51/0xd0
Jul 20 02:17:54 lamb kernel: [21061.429269]  __blk_mq_delay_run_hw_queue+0x141/0x160
Jul 20 02:17:54 lamb kernel: [21061.429752]  blk_mq_sched_insert_requests+0x6a/0xf0
Jul 20 02:17:54 lamb kernel: [21061.430233]  blk_mq_flush_plug_list+0x119/0x1b0
Jul 20 02:17:54 lamb kernel: [21061.430756]  blk_flush_plug_list+0xd7/0x100
Jul 20 02:17:54 lamb kernel: [21061.431241]  blk_finish_plug+0x21/0x30
Jul 20 02:17:54 lamb kernel: [21061.431734]  dispatch_rw_block_io+0x6a5/0x9a0 [xen_blkback]
Jul 20 02:17:54 lamb kernel: [21061.432220]  __do_block_io_op+0x31d/0x620 [xen_blkback]
Jul 20 02:17:54 lamb kernel: [21061.432714]  ? _raw_spin_unlock_irqrestore+0x14/0x20
Jul 20 02:17:54 lamb kernel: [21061.433193]  ? try_to_del_timer_sync+0x4d/0x80
Jul 20 02:17:54 lamb kernel: [21061.433680]  xen_blkif_schedule+0xda/0x670 [xen_blkback]
Jul 20 02:17:54 lamb kernel: [21061.434160]  ? __schedule+0x2c6/0x770
Jul 20 02:17:54 lamb kernel: [21061.434679]  ? finish_wait+0x80/0x80
Jul 20 02:17:54 lamb kernel: [21061.435129]  ? xen_blkif_be_int+0x30/0x30 [xen_blkback]
Jul 20 02:17:54 lamb kernel: [21061.435571]  kthread+0x116/0x130
Jul 20 02:17:54 lamb kernel: [21061.436002]  ? kthread_park+0x80/0x80
Jul 20 02:17:54 lamb kernel: [21061.436422]  ret_from_fork+0x22/0x30
Jul 20 02:17:54 lamb kernel: [21061.436846] ---[ end trace 1d90be7aea2d9148 ]---
Jul 20 02:17:54 lamb kernel: [21061.437250] blk_update_request: I/O error, dev nvme0n1, sector 912000815 op 0x1:(WRITE) flags 0x800 phys_seg 13 prio class 0
Jul 20 02:17:54 lamb kernel: [21061.446344] md/raid1:md4: Disk failure on nvme0n1, disabling device.
Jul 20 02:17:54 lamb kernel: [21061.446344] md/raid1:md4: Operation continuing on 1 devices.

I've reported this on the linux-nvme list at
<http://lists.infradead.org/pipermail/linux-nvme/2021-July/026637.html>
and they are saying that somehow IO is being submitted that's not of
a size divisible by 4096 which the nvme driver rejects.

I've since managed to reproduce it on test hardware.

This is a Xen dom0 system; the issue appears to happen sometimes
when a guest with a misaligned partition table does heavy IO. That
is, the above sectors are inside an LVM logical volume that is
exported to a guest as its xvda disk and the admin of the guest has
partitioned that with a first partition starting at 63 sectors
instead of the more modern 2048. By replicating such a guest setup I
can provoke the issue in a few seconds using fio.

This did not occur with the 4.19 buster kernel which was in use for
nearly 2 years on this hardware with the same weird guest present,
and I can't reproduce it with that kernel in tests.

I've been suggested two commits to try:

 - <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/dma/swiotlb.c?id=5f89468e2f060031cd89fd4287298e0eaf246bf6> as suggested in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026639.html>

 - <https://lore.kernel.org/patchwork/patch/1442338/> as suggested
   in <http://lists.infradead.org/pipermail/linux-nvme/2021-July/026740.html>

The first one I can't work out how to apply to
linux-image-amd64/buster-backports because it's for 5.13 and too
many other changes happened.

The second looks like I could test it quite easily. It only got into
upstream at 5.14-rc1 though.

I could instead build the latest upstream 5.14 which contains both
these commits, see if it works and if it does bisect it back until
it breaks again.

Since I'm a non-kernel-hacker who can just about handle rebuilding
Debian kernel packages and bisecting these or upstream but not real
backporting work, which direction do you think I should explore?

Should I report this as a Debian bug in
linux-image-5.10.0-0.bpo.7-amd64 now? I cannot at this stage confirm
that it happens with bullseye's kernel as my test hardware is tied
up working on this with buster(-backports).

Thanks,
Andy
Reply to:
Follow-Ups:
- Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
  - From: Andy Smith <andy@strugglers.net>
Prev by Date: linux-signed-i386_4.19.194+3_source.changes ACCEPTED into proposed-updates->stable-new, proposed-updates
Next by Date: Bug#991467: Vega 56's GPUTach stuck at 100% as soon as the driver is loaded
Previous by thread: linux-signed-i386_4.19.194+3_source.changes ACCEPTED into proposed-updates->stable-new, proposed-updates
Next by thread: Re: 5.10.40-1~bpo10+1: nvme - Invalid SGL for payload:131072 nents:13
Index(es):
- Date
- Thread