[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1099591: linux-image-6.12.12-amd64: crash in netfs module when copying (large) directory from smb share



Am Mi., 5. März 2025 um 14:24 Uhr schrieb Salvatore Bonaccorso
<carnil@debian.org>:
>
> Control: forcemerge 1098698 -1
>
> Hi Norbert,
>
> On Wed, Mar 05, 2025 at 12:15:29PM +0100, Norbert Lange wrote:
> > Package: src:linux
> > Version: 6.12.12-1
> > Severity: important
> > X-Debbugs-Cc: nolange79@gmail.com
> >
> > Dear Maintainer,
> >
> > I experience an immediate Kernel Crash when copying large files/directories from
> > a mounted Samba share.
> > I can consistently reproduce the crash in Qemu (from which I grabbed the Log).
> >
> > My content of `/etc/fstab`:
> >
> > ```
> > //vienas01.andritz.com/HIPASE  /run/media/HIPASE_Q   smb3 credentials=/tmp/creds.txt,uid=1000,user,vers=3,nofail,noatime,noauto   0    0
> > ```
> >
> > The sequence leading to the crash is a filecopy to local HDD:
> >
> > ``` bash
> > mount /run/media/HIPASE_Q
> > cp -r /run/media/HIPASE_Q/DIR ~/Download
> > ```
> >
> > Output from mount is:
> >
> > ```
> > //vienas01.andritz.com/HIPASE on /run/media/HIPASE_Q type smb3 (rw,nosuid,nodev,relatime,vers=3.1.1,cache=strict,username=XXX,domain=YYY,uid=1000,forceuid,gid=1000,forcegid,addr=172.24.180.161,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,reparse=nfs,rsize=65536,wsize=65536,bsize=1048576,retrans=1,echo_interval=60,actimeo=1,closetimeo=1,user=XXX)
> > ```
> >
> > The crash is diagnosed as follows (again, under Qemu with the same kernel):
> >
> > ```
> > Debian GNU/Linux trixie/sid debian-replace ttyS0
> >
> > debian-replace login:
> > [  222.366764] BUG: kernel NULL pointer dereference, address: 0000000000000068
> > [  222.367079] #PF: supervisor read access in kernel mode
> > [  222.367268] #PF: error_code(0x0000) - not-present page
> > [  222.367465] PGD 0 P4D 0
> > [  222.367565] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [  222.367757] CPU: 1 UID: 0 PID: 45 Comm: kworker/1:1 Not tainted 6.12.12-amd64 #1  Debian 6.12.12-1
> > [  222.368074] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > [  222.368456] Workqueue: cifsiod smb2_readv_worker [cifs]
> > [  222.368715] RIP: 0010:netfs_consume_read_data.isra.0 (fs/netfs/read_collect.c:262) netfs
> > [ 222.368985] Code: 74 24 10 4c 89 fb 49 8b 47 68 48 85 d2 0f 85 ce 01 00 00 48 8b 4c 24 30 49 8b 7f 30 48 83 c1 70 48 39 cf 74 17 4c 8b 5c 24 40 <49> 8b 73 68 49 03 73 60 49 39 77 60 0f 84 b2 04 00 00 48 29 d0 4c
> > All code
> > ========
> >    0: 74 24                   je     0x26
> >    2: 10 4c 89 fb             adc    %cl,-0x5(%rcx,%rcx,4)
> >    6: 49 8b 47 68             mov    0x68(%r15),%rax
> >    a: 48 85 d2                test   %rdx,%rdx
> >    d: 0f 85 ce 01 00 00       jne    0x1e1
> >   13: 48 8b 4c 24 30          mov    0x30(%rsp),%rcx
> >   18: 49 8b 7f 30             mov    0x30(%r15),%rdi
> >   1c: 48 83 c1 70             add    $0x70,%rcx
> >   20: 48 39 cf                cmp    %rcx,%rdi
> >   23: 74 17                   je     0x3c
> >   25: 4c 8b 5c 24 40          mov    0x40(%rsp),%r11
> >   2a:*        49 8b 73 68             mov    0x68(%r11),%rsi          <-- trapping instruction
> >   2e: 49 03 73 60             add    0x60(%r11),%rsi
> >   32: 49 39 77 60             cmp    %rsi,0x60(%r15)
> >   36: 0f 84 b2 04 00 00       je     0x4ee
> >   3c: 48 29 d0                sub    %rdx,%rax
> >   3f: 4c                      rex.WR
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: 49 8b 73 68             mov    0x68(%r11),%rsi
> >    4: 49 03 73 60             add    0x60(%r11),%rsi
> >    8: 49 39 77 60             cmp    %rsi,0x60(%r15)
> >    c: 0f 84 b2 04 00 00       je     0x4c4
> >   12: 48 29 d0                sub    %rdx,%rax
> >   15: 4c                      rex.WR
> > [  222.369710] RSP: 0018:ffffbdc900177dd0 EFLAGS: 00010283
> > [  222.369902] RAX: 0000000000010000 RBX: ffff96530204b280 RCX: ffff9653020d7770
> > [  222.370160] RDX: 0000000000000000 RSI: 0000000000440000 RDI: ffff96530204b168
> > [  222.370434] RBP: 0000000000000000 R08: 0000000000010000 R09: 0000000000000000
> > [  222.370711] R10: 0000000000000008 R11: 0000000000000000 R12: ffff9653020d78e8
> > [  222.371001] R13: 0000000000040000 R14: ffff9653020d78e8 R15: ffff96530204b280
> > [  222.371268] FS:  0000000000000000(0000) GS:ffff96537bd00000(0000) knlGS:0000000000000000
> > [  222.371561] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  222.371764] CR2: 0000000000000068 CR3: 0000000106bb4000 CR4: 0000000000752ef0
> > [  222.372036] PKRU: 55555554
> > [  222.372147] Call Trace:
> > [  222.372248]  <TASK>
> > [  222.372327] ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1))
> > [  222.372492] ? page_fault_oops (arch/x86/mm/fault.c:711 (discriminator 1))
> > [  222.372658] ? exc_page_fault (arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
> > [  222.372808] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623)
> > [  222.372961] ? netfs_consume_read_data.isra.0 (fs/netfs/read_collect.c:262) netfs
> > [  222.373176] netfs_read_subreq_terminated (arch/x86/include/asm/bitops.h:94 include/asm-generic/bitops/instrumented-non-atomic.h:45 fs/netfs/read_collect.c:502) netfs
> > [  222.373380] process_one_work (kernel/workqueue.c:3229)
> > [  222.373525] worker_thread (kernel/workqueue.c:3304 (discriminator 2) kernel/workqueue.c:3391 (discriminator 2))
> > [  222.373657] ? __pfx_worker_thread (kernel/workqueue.c:3337)
> > [  222.373817] kthread (kernel/kthread.c:389)
> > [  222.373929] ? __pfx_kthread (kernel/kthread.c:342)
> > [  222.374077] ret_from_fork (arch/x86/kernel/process.c:147)
> > [  222.374203] ? __pfx_kthread (kernel/kthread.c:342)
> > [  222.374335] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
> > [  222.374472]  </TASK>
> > [  222.374551] Modules linked in: cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils cifs_md4 dns_resolver netfs uinput snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore rfkill nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common binfmt_misc intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd iTCO_wdt intel_pmc_bxt cryptd iTCO_vendor_support watchdog qxl rapl joydev serio_raw evdev pcspkr button vmwgfx drm_ttm_helper ttm drm_kms_helper drm configfs efi_pstore nfnetlink qemu_fw_cfg virtio_console virtio_rng ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic ahci libahci xhci_pci libata xhci_hcd nvme crc32_pclmul crc32c_intel i2c_i801 scsi_mod psmouse virtio_net usbcore net_failover failover i2c_smbus nvme_core scsi_common lpc_ich nvme_auth usb_common
> > [  222.377319] CR2: 0000000000000068
> > [  222.377437] ---[ end trace 0000000000000000 ]---
> > [  222.377596] RIP: 0010:netfs_consume_read_data.isra.0 (fs/netfs/read_collect.c:262) netfs
> > [ 222.377839] Code: 74 24 10 4c 89 fb 49 8b 47 68 48 85 d2 0f 85 ce 01 00 00 48 8b 4c 24 30 49 8b 7f 30 48 83 c1 70 48 39 cf 74 17 4c 8b 5c 24 40 <49> 8b 73 68 49 03 73 60 49 39 77 60 0f 84 b2 04 00 00 48 29 d0 4c
> > All code
> > ========
> >    0: 74 24                   je     0x26
> >    2: 10 4c 89 fb             adc    %cl,-0x5(%rcx,%rcx,4)
> >    6: 49 8b 47 68             mov    0x68(%r15),%rax
> >    a: 48 85 d2                test   %rdx,%rdx
> >    d: 0f 85 ce 01 00 00       jne    0x1e1
> >   13: 48 8b 4c 24 30          mov    0x30(%rsp),%rcx
> >   18: 49 8b 7f 30             mov    0x30(%r15),%rdi
> >   1c: 48 83 c1 70             add    $0x70,%rcx
> >   20: 48 39 cf                cmp    %rcx,%rdi
> >   23: 74 17                   je     0x3c
> >   25: 4c 8b 5c 24 40          mov    0x40(%rsp),%r11
> >   2a:*        49 8b 73 68             mov    0x68(%r11),%rsi          <-- trapping instruction
> >   2e: 49 03 73 60             add    0x60(%r11),%rsi
> >   32: 49 39 77 60             cmp    %rsi,0x60(%r15)
> >   36: 0f 84 b2 04 00 00       je     0x4ee
> >   3c: 48 29 d0                sub    %rdx,%rax
> >   3f: 4c                      rex.WR
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: 49 8b 73 68             mov    0x68(%r11),%rsi
> >    4: 49 03 73 60             add    0x60(%r11),%rsi
> >    8: 49 39 77 60             cmp    %rsi,0x60(%r15)
> >    c: 0f 84 b2 04 00 00       je     0x4c4
> >   12: 48 29 d0                sub    %rdx,%rax
> >   15: 4c                      rex.WR
> > [  222.378484] RSP: 0018:ffffbdc900177dd0 EFLAGS: 00010283
> > [  222.378666] RAX: 0000000000010000 RBX: ffff96530204b280 RCX: ffff9653020d7770
> > [  222.378910] RDX: 0000000000000000 RSI: 0000000000440000 RDI: ffff96530204b168
> > [  222.379153] RBP: 0000000000000000 R08: 0000000000010000 R09: 0000000000000000
> > [  222.379396] R10: 0000000000000008 R11: 0000000000000000 R12: ffff9653020d78e8
> > [  222.379638] R13: 0000000000040000 R14: ffff9653020d78e8 R15: ffff96530204b280
> > [  222.379880] FS:  0000000000000000(0000) GS:ffff96537bd00000(0000) knlGS:0000000000000000
> > [  222.380154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  222.380352] CR2: 0000000000000068 CR3: 0000000106bb4000 CR4: 0000000000752ef0
> > [  222.380596] PKRU: 55555554
> > [  222.380692] Kernel panic - not syncing: Fatal exception in interrupt
> > [  222.381450] Kernel Offset: 0xbc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [  222.381829] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > ```
>
> Thanks for your report. I believe this is all related to the same root
> causes for #1098698, thus going to merge those both reports.
>
> If you have the possibilties have please a look at
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1098698#34
> and report back if that fixes your issue.

That specific patch seems to handle the issue with
'kernel BUG at fs/netfs/read_collect.c:315!'

Not the segfault.

> Max Kellermann has pointed
> out the open issues here:
> https://lore.kernel.org/netfs/CAKPOu+_WAM3RQJnHsKfEh5sG5tBuCPt1EWtoUFVC2ma=ORjHkg@mail.gmail.com/

Its hard to follow what is merged in with branch/version upstream, and
whats added to debian. Not sure which patches I should add.

I tested the easily available versions in debian:

linux-image-6.12.12-amd64  6.12.12-1 -> this bug report
linux-image-6.12.17-amd64  6.12.17-1 -> identical behavior
linux-image-6.13-amd64  6.13.5-1~exp1 -> 'kernel BUG at
fs/netfs/read_collect.c:316!'

6.12 is a LTS kernel, aint there a repo where all proposed backports
should be available?
The situation is kinda bad right now, no workaround available.

Regards, Norbert.


Reply to: