Bug#1099591: linux-image-6.12.12-amd64: crash in netfs module when copying (large) directory from smb share
Am Mi., 5. März 2025 um 14:24 Uhr schrieb Salvatore Bonaccorso
<carnil@debian.org>:
>
> Control: forcemerge 1098698 -1
>
> Hi Norbert,
>
> On Wed, Mar 05, 2025 at 12:15:29PM +0100, Norbert Lange wrote:
> > Package: src:linux
> > Version: 6.12.12-1
> > Severity: important
> > X-Debbugs-Cc: nolange79@gmail.com
> >
> > Dear Maintainer,
> >
> > I experience an immediate Kernel Crash when copying large files/directories from
> > a mounted Samba share.
> > I can consistently reproduce the crash in Qemu (from which I grabbed the Log).
> >
> > My content of `/etc/fstab`:
> >
> > ```
> > //vienas01.andritz.com/HIPASE /run/media/HIPASE_Q smb3 credentials=/tmp/creds.txt,uid=1000,user,vers=3,nofail,noatime,noauto 0 0
> > ```
> >
> > The sequence leading to the crash is a filecopy to local HDD:
> >
> > ``` bash
> > mount /run/media/HIPASE_Q
> > cp -r /run/media/HIPASE_Q/DIR ~/Download
> > ```
> >
> > Output from mount is:
> >
> > ```
> > //vienas01.andritz.com/HIPASE on /run/media/HIPASE_Q type smb3 (rw,nosuid,nodev,relatime,vers=3.1.1,cache=strict,username=XXX,domain=YYY,uid=1000,forceuid,gid=1000,forcegid,addr=172.24.180.161,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,reparse=nfs,rsize=65536,wsize=65536,bsize=1048576,retrans=1,echo_interval=60,actimeo=1,closetimeo=1,user=XXX)
> > ```
> >
> > The crash is diagnosed as follows (again, under Qemu with the same kernel):
> >
> > ```
> > Debian GNU/Linux trixie/sid debian-replace ttyS0
> >
> > debian-replace login:
> > [ 222.366764] BUG: kernel NULL pointer dereference, address: 0000000000000068
> > [ 222.367079] #PF: supervisor read access in kernel mode
> > [ 222.367268] #PF: error_code(0x0000) - not-present page
> > [ 222.367465] PGD 0 P4D 0
> > [ 222.367565] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 222.367757] CPU: 1 UID: 0 PID: 45 Comm: kworker/1:1 Not tainted 6.12.12-amd64 #1 Debian 6.12.12-1
> > [ 222.368074] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > [ 222.368456] Workqueue: cifsiod smb2_readv_worker [cifs]
> > [ 222.368715] RIP: 0010:netfs_consume_read_data.isra.0 (fs/netfs/read_collect.c:262) netfs
> > [ 222.368985] Code: 74 24 10 4c 89 fb 49 8b 47 68 48 85 d2 0f 85 ce 01 00 00 48 8b 4c 24 30 49 8b 7f 30 48 83 c1 70 48 39 cf 74 17 4c 8b 5c 24 40 <49> 8b 73 68 49 03 73 60 49 39 77 60 0f 84 b2 04 00 00 48 29 d0 4c
> > All code
> > ========
> > 0: 74 24 je 0x26
> > 2: 10 4c 89 fb adc %cl,-0x5(%rcx,%rcx,4)
> > 6: 49 8b 47 68 mov 0x68(%r15),%rax
> > a: 48 85 d2 test %rdx,%rdx
> > d: 0f 85 ce 01 00 00 jne 0x1e1
> > 13: 48 8b 4c 24 30 mov 0x30(%rsp),%rcx
> > 18: 49 8b 7f 30 mov 0x30(%r15),%rdi
> > 1c: 48 83 c1 70 add $0x70,%rcx
> > 20: 48 39 cf cmp %rcx,%rdi
> > 23: 74 17 je 0x3c
> > 25: 4c 8b 5c 24 40 mov 0x40(%rsp),%r11
> > 2a:* 49 8b 73 68 mov 0x68(%r11),%rsi <-- trapping instruction
> > 2e: 49 03 73 60 add 0x60(%r11),%rsi
> > 32: 49 39 77 60 cmp %rsi,0x60(%r15)
> > 36: 0f 84 b2 04 00 00 je 0x4ee
> > 3c: 48 29 d0 sub %rdx,%rax
> > 3f: 4c rex.WR
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 49 8b 73 68 mov 0x68(%r11),%rsi
> > 4: 49 03 73 60 add 0x60(%r11),%rsi
> > 8: 49 39 77 60 cmp %rsi,0x60(%r15)
> > c: 0f 84 b2 04 00 00 je 0x4c4
> > 12: 48 29 d0 sub %rdx,%rax
> > 15: 4c rex.WR
> > [ 222.369710] RSP: 0018:ffffbdc900177dd0 EFLAGS: 00010283
> > [ 222.369902] RAX: 0000000000010000 RBX: ffff96530204b280 RCX: ffff9653020d7770
> > [ 222.370160] RDX: 0000000000000000 RSI: 0000000000440000 RDI: ffff96530204b168
> > [ 222.370434] RBP: 0000000000000000 R08: 0000000000010000 R09: 0000000000000000
> > [ 222.370711] R10: 0000000000000008 R11: 0000000000000000 R12: ffff9653020d78e8
> > [ 222.371001] R13: 0000000000040000 R14: ffff9653020d78e8 R15: ffff96530204b280
> > [ 222.371268] FS: 0000000000000000(0000) GS:ffff96537bd00000(0000) knlGS:0000000000000000
> > [ 222.371561] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 222.371764] CR2: 0000000000000068 CR3: 0000000106bb4000 CR4: 0000000000752ef0
> > [ 222.372036] PKRU: 55555554
> > [ 222.372147] Call Trace:
> > [ 222.372248] <TASK>
> > [ 222.372327] ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1))
> > [ 222.372492] ? page_fault_oops (arch/x86/mm/fault.c:711 (discriminator 1))
> > [ 222.372658] ? exc_page_fault (arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
> > [ 222.372808] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623)
> > [ 222.372961] ? netfs_consume_read_data.isra.0 (fs/netfs/read_collect.c:262) netfs
> > [ 222.373176] netfs_read_subreq_terminated (arch/x86/include/asm/bitops.h:94 include/asm-generic/bitops/instrumented-non-atomic.h:45 fs/netfs/read_collect.c:502) netfs
> > [ 222.373380] process_one_work (kernel/workqueue.c:3229)
> > [ 222.373525] worker_thread (kernel/workqueue.c:3304 (discriminator 2) kernel/workqueue.c:3391 (discriminator 2))
> > [ 222.373657] ? __pfx_worker_thread (kernel/workqueue.c:3337)
> > [ 222.373817] kthread (kernel/kthread.c:389)
> > [ 222.373929] ? __pfx_kthread (kernel/kthread.c:342)
> > [ 222.374077] ret_from_fork (arch/x86/kernel/process.c:147)
> > [ 222.374203] ? __pfx_kthread (kernel/kthread.c:342)
> > [ 222.374335] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
> > [ 222.374472] </TASK>
> > [ 222.374551] Modules linked in: cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils cifs_md4 dns_resolver netfs uinput snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore rfkill nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common binfmt_misc intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd iTCO_wdt intel_pmc_bxt cryptd iTCO_vendor_support watchdog qxl rapl joydev serio_raw evdev pcspkr button vmwgfx drm_ttm_helper ttm drm_kms_helper drm configfs efi_pstore nfnetlink qemu_fw_cfg virtio_console virtio_rng ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic ahci libahci xhci_pci libata xhci_hcd nvme crc32_pclmul crc32c_intel i2c_i801 scsi_mod psmouse virtio_net usbcore net_failover failover i2c_smbus nvme_core scsi_common lpc_ich nvme_auth usb_common
> > [ 222.377319] CR2: 0000000000000068
> > [ 222.377437] ---[ end trace 0000000000000000 ]---
> > [ 222.377596] RIP: 0010:netfs_consume_read_data.isra.0 (fs/netfs/read_collect.c:262) netfs
> > [ 222.377839] Code: 74 24 10 4c 89 fb 49 8b 47 68 48 85 d2 0f 85 ce 01 00 00 48 8b 4c 24 30 49 8b 7f 30 48 83 c1 70 48 39 cf 74 17 4c 8b 5c 24 40 <49> 8b 73 68 49 03 73 60 49 39 77 60 0f 84 b2 04 00 00 48 29 d0 4c
> > All code
> > ========
> > 0: 74 24 je 0x26
> > 2: 10 4c 89 fb adc %cl,-0x5(%rcx,%rcx,4)
> > 6: 49 8b 47 68 mov 0x68(%r15),%rax
> > a: 48 85 d2 test %rdx,%rdx
> > d: 0f 85 ce 01 00 00 jne 0x1e1
> > 13: 48 8b 4c 24 30 mov 0x30(%rsp),%rcx
> > 18: 49 8b 7f 30 mov 0x30(%r15),%rdi
> > 1c: 48 83 c1 70 add $0x70,%rcx
> > 20: 48 39 cf cmp %rcx,%rdi
> > 23: 74 17 je 0x3c
> > 25: 4c 8b 5c 24 40 mov 0x40(%rsp),%r11
> > 2a:* 49 8b 73 68 mov 0x68(%r11),%rsi <-- trapping instruction
> > 2e: 49 03 73 60 add 0x60(%r11),%rsi
> > 32: 49 39 77 60 cmp %rsi,0x60(%r15)
> > 36: 0f 84 b2 04 00 00 je 0x4ee
> > 3c: 48 29 d0 sub %rdx,%rax
> > 3f: 4c rex.WR
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 49 8b 73 68 mov 0x68(%r11),%rsi
> > 4: 49 03 73 60 add 0x60(%r11),%rsi
> > 8: 49 39 77 60 cmp %rsi,0x60(%r15)
> > c: 0f 84 b2 04 00 00 je 0x4c4
> > 12: 48 29 d0 sub %rdx,%rax
> > 15: 4c rex.WR
> > [ 222.378484] RSP: 0018:ffffbdc900177dd0 EFLAGS: 00010283
> > [ 222.378666] RAX: 0000000000010000 RBX: ffff96530204b280 RCX: ffff9653020d7770
> > [ 222.378910] RDX: 0000000000000000 RSI: 0000000000440000 RDI: ffff96530204b168
> > [ 222.379153] RBP: 0000000000000000 R08: 0000000000010000 R09: 0000000000000000
> > [ 222.379396] R10: 0000000000000008 R11: 0000000000000000 R12: ffff9653020d78e8
> > [ 222.379638] R13: 0000000000040000 R14: ffff9653020d78e8 R15: ffff96530204b280
> > [ 222.379880] FS: 0000000000000000(0000) GS:ffff96537bd00000(0000) knlGS:0000000000000000
> > [ 222.380154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 222.380352] CR2: 0000000000000068 CR3: 0000000106bb4000 CR4: 0000000000752ef0
> > [ 222.380596] PKRU: 55555554
> > [ 222.380692] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 222.381450] Kernel Offset: 0xbc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 222.381829] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > ```
>
> Thanks for your report. I believe this is all related to the same root
> causes for #1098698, thus going to merge those both reports.
>
> If you have the possibilties have please a look at
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1098698#34
> and report back if that fixes your issue.
That specific patch seems to handle the issue with
'kernel BUG at fs/netfs/read_collect.c:315!'
Not the segfault.
> Max Kellermann has pointed
> out the open issues here:
> https://lore.kernel.org/netfs/CAKPOu+_WAM3RQJnHsKfEh5sG5tBuCPt1EWtoUFVC2ma=ORjHkg@mail.gmail.com/
Its hard to follow what is merged in with branch/version upstream, and
whats added to debian. Not sure which patches I should add.
I tested the easily available versions in debian:
linux-image-6.12.12-amd64 6.12.12-1 -> this bug report
linux-image-6.12.17-amd64 6.12.17-1 -> identical behavior
linux-image-6.13-amd64 6.13.5-1~exp1 -> 'kernel BUG at
fs/netfs/read_collect.c:316!'
6.12 is a LTS kernel, aint there a repo where all proposed backports
should be available?
The situation is kinda bad right now, no workaround available.
Regards, Norbert.
Reply to: