Bug#1088733: linux-image-6.1.0-28-amd64: list_del corruption in cifs_put_smb_ses frequently causes system hang
Hi Michael,
On Sat, Nov 30, 2024 at 03:45:20AM +0100, Michael wrote:
> Dear Maintainer,
>
> * What led up to the situation?
>
> We have a CIFS (DFS) infrastructure and after mounting (at some point)
> these the kernel on this client reports a BUG and a few minutes later
> the whole system hangs:
>
> ##################### TRACE
>
> CIFS: VFS: \\SOME.SERVER.FQDN cifs_put_smb_ses: Session Logoff failure rc=-11
> CIFS: VFS: \\(null) cifs_put_smb_ses: Session Logoff failure rc=-11
> list_del corruption, ffff966536fe7800->next is NULL
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:49!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI
> CPU: 6 PID: 2498151 Comm: kworker/6:9 Tainted: G OE 6.1.0-23-amd64 #1 Debian 6.1.99-1
> Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 2.9.0 12/06/2019
> Workqueue: events delayed_mntput
> RIP: 0010:__list_del_entry_valid.cold+0xf/0x6f
> Code: c7 c7 88 3c fa a0 e8 90 a0 fe ff 0f 0b 48 c7 c7 60 3c fa a0 e8 82 a0 fe ff 0f 0b 48 89 fe 48 c7 c7 70 3d fa a0 e8 71 a0 fe ff <0f> 0b 48 89 d1 48 c7 c7 90 3e fa a0 48 89 c2 e8 5d a0 fe ff 0f 0b
> RSP: 0018:ffffad83a63f7dd0 EFLAGS: 00010246
> RAX: 0000000000000033 RBX: ffff966536fe7800 RCX: 0000000000000027
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff965e7f8e03a0
> RBP: 00000000142d66a6 R08: 0000000000000000 R09: ffffad83a63f7c68
> R10: 0000000000000003 R11: ffff966ebff11be0 R12: 00000000fffffff5
> R13: ffff966536fe7000 R14: ffff966536fe7020 R15: ffffffffa1770b88
> FS: 0000000000000000(0000) GS:ffff965e7f8c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe35dbcb7b0 CR3: 0000000f36c10001 CR4: 00000000000606e0
> Call Trace:
> <TASK>
> ? __die_body.cold+0x1a/0x1f
> ? die+0x2a/0x50
> ? do_trap+0xc5/0x110
> ? __list_del_entry_valid.cold+0xf/0x6f
> ? do_error_trap+0x6a/0x90
> ? __list_del_entry_valid.cold+0xf/0x6f
> ? exc_invalid_op+0x4c/0x60
> ? __list_del_entry_valid.cold+0xf/0x6f
> ? asm_exc_invalid_op+0x16/0x20
> ? __list_del_entry_valid.cold+0xf/0x6f
> cifs_put_smb_ses+0xbb/0x3e0 [cifs]
> mount_group_release+0x82/0xa0 [cifs]
> cifs_umount+0x88/0xa0 [cifs]
> deactivate_locked_super+0x2f/0xa0
> cleanup_mnt+0xbd/0x150
> delayed_mntput+0x28/0x40
> process_one_work+0x1c7/0x380
> worker_thread+0x4d/0x380
> ? rescuer_thread+0x3a0/0x3a0
> kthread+0xda/0x100
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x22/0x30
> </TASK>
> Modules linked in: bluetooth jitterentropy_rng drbg ansi_cprng ecdh_generic rfkill ecc overlay isofs cmac nls_utf8 cifs cifs_arc4 cifs_md4 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs tls beegfs(OE) rpcrdma rdma_ucm ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel ib_core nft_chain_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif binfmt_misc kvm irqbypass ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl dcdbas mgag200 intel_cstate joydev evdev drm_shmem_helper intel_uncore iTCO_wdt ipmi_si drm_kms_helper mei_me intel_pmc_bxt ipmi_devintf iTCO_vendor_support pcspkr i2c_algo_bit mei ipmi_msghandler watchdog sg acpi_power_meter button nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm fuse loop efi_pstore configfs
> ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 dm_mod hid_generic usbhid hid sd_mod t10_pi sr_mod cdrom crc64_rocksoft crc64 crc_t10dif crct10dif_generic ahci libahci crct10dif_pclmul crct10dif_common crc32_pclmul libata ehci_pci bnx2x ehci_hcd megaraid_sas usbcore scsi_mod lpc_ich usb_common mdio libcrc32c crc32c_generic scsi_common crc32c_intel wmi
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:__list_del_entry_valid.cold+0xf/0x6f
> Code: c7 c7 88 3c fa a0 e8 90 a0 fe ff 0f 0b 48 c7 c7 60 3c fa a0 e8 82 a0 fe ff 0f 0b 48 89 fe 48 c7 c7 70 3d fa a0 e8 71 a0 fe ff <0f> 0b 48 89 d1 48 c7 c7 90 3e fa a0 48 89 c2 e8 5d a0 fe ff 0f 0b
> RSP: 0018:ffffad83a63f7dd0 EFLAGS: 00010246
> RAX: 0000000000000033 RBX: ffff966536fe7800 RCX: 0000000000000027
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff965e7f8e03a0
> RBP: 00000000142d66a6 R08: 0000000000000000 R09: ffffad83a63f7c68
> R10: 0000000000000003 R11: ffff966ebff11be0 R12: 00000000fffffff5
> R13: ffff966536fe7000 R14: ffff966536fe7020 R15: ffffffffa1770b88
> FS: 0000000000000000(0000) GS:ffff965e7f8c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe35dbcb7b0 CR3: 0000000f36c10001 CR4: 00000000000606e0
> note: kworker/6:9[2498151] exited with preempt_count 1
>
>
> END TRACE #####################
>
> This looks fairly similar to what has been reported with
> https://security-tracker.debian.org/tracker/CVE-2024-35870 except that
> it's not triggered by smb2_reconnect_server, but rather some DFS related
> code. I manually backported the fix from
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=24a9799aa8efecd0eb55a75e35f9d8e6400063aa
> to 6.1.119 and am testing this (succesfully for now).
Having reported that back is great, thank you. In fact this commit was
Cc'ed to stable, but it only got applied for 6.6.29 and 6.8.5 since it
failed to apply for other stable series:
In particular relevant for 6.1.y is
https://lore.kernel.org/stable/2024040834-magazine-audience-8aa4@gregkh/
Let us see what we can do upstream, once it flows in in 6.1.y we can
fix it as well for bookworm's 6.1.y based kernel.
Regards,
Salvatore
Reply to: