[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#988044: linux-image-5.10.0-0.bpo.4-amd64: Kernel panic due nfsd error: refcount_t: addition on 0; use-after-free



Hi Salvatore,

Sorry for my very very late answer.

I tried many actions to reproduce and/or fix this problem. We upgraded our kernel to the lastest backport version for Buster but the issue was still there. We upgraded to Bullseye at the end of October but today this issue occurred again. 

I think it related to https://lore.kernel.org/linux-nfs/YV3vDQOPVgxc%2FJ99@eldamar.lan/

I just send an email to this thread. 

Something that may be important: our groups of servers are used to host websites, some with only one website, some with many. For now, this issue only occurred on NFS servers part of a group with many websites. But one of the most loaded NFS server is also used only for one website and has never had this issue. May be this issue occurs only when the NFS server is serving many different files.

Regards,

Olivier

-----Message d'origine-----
De : Salvatore Bonaccorso <salvatore.bonaccorso@gmail.com> De la part de Salvatore Bonaccorso
Envoyé : mardi 4 mai 2021 20:47
À : Olivier Monaco <olivier@bm-services.com>; 988044@bugs.debian.org
Objet : Re: Bug#988044: linux-image-5.10.0-0.bpo.4-amd64: Kernel panic due nfsd error: refcount_t: addition on 0; use-after-free

Control: tags -1 + moreinfo

Hi Olivier,

On Tue, May 04, 2021 at 10:01:17AM +0200, Olivier Monaco wrote:
> Package: src:linux
> Version: 5.10.19-1~bpo10+1
> Severity: important
> 
> On a virtual machine running a NFS server the following kernel panic occurs:
> 2021-05-04T02:28:21.051193+02:00 storage-t20 kernel: [1736623.921391] 
> ------------[ cut here ]------------
> 2021-05-04T02:28:21.051214+02:00 storage-t20 kernel: [1736623.921406] refcount_t: addition on 0; use-after-free.
> 2021-05-04T02:28:21.051215+02:00 storage-t20 kernel: [1736623.921416] 
> WARNING: CPU: 0 PID: 675 at lib/refcount.c:25 
> refcount_warn_saturate+0x6d/0xf0
> 2021-05-04T02:28:21.051216+02:00 storage-t20 kernel: [1736623.921417] 
> Modules linked in: binfmt_misc vsock_loopback 
> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock 
> intel_rapl_msr intel_rapl_common nfit libnvdimm crc32_pclmul 
> ghash_clmulni_intel aesni_intel libaes crypto_simd cryptd glue_helper 
> rapl vm w_balloon vmwgfx joydev evdev serio_raw pcspkr ttm sg 
> drm_kms_helper vmw_vmci cec ac button nfsd auth_rpcgss nfs_acl lockd 
> grace drm sunrpc fuse configfs ip_tables x_tables autofs4 btrfs 
> blake2b_generic xor raid6_pq libcrc32c crc32c_generic dm_mod sd_mod 
> t10_pi crc_t10dif crct10dif_generic ata_generic crct10dif_pclmul  
> crct10dif_common crc32c_intel psmouse vmxnet3 ata_piix libata 
> vmw_pvscsi scsi_mod i2c_piix4
> 2021-05-04T02:28:21.051217+02:00 storage-t20 kernel: [1736623.921488] 
> CPU: 0 PID: 675 Comm: nfsd Not tainted 5.10.0-0.bpo.4-amd64 #1 Debian 
> 5.10.19-1~bpo10+1
> 2021-05-04T02:28:21.051218+02:00 storage-t20 kernel: [1736623.921488] 
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop 
> Reference Platform, BIOS 6.00 12/12/2018
> 2021-05-04T02:28:21.051219+02:00 storage-t20 kernel: [1736623.921491] 
> RIP: 0010:refcount_warn_saturate+0x6d/0xf0
> 2021-05-04T02:28:21.051219+02:00 storage-t20 kernel: [1736623.921492] 
> Code: 05 d8 be 3f 01 01 e8 c3 0a 40 00 0f 0b c3 80 3d c8 be 3f 01 00 
> 75 ce 48 c7 c7 30 6c 92 86 c6 05 b8 be 3f 01 01 e8 a4 0a 40 00 <0f> 0b 
> c3 80 3d ab be 3f 01 00 75 af 48 c7 c7 08 6c 92 86 c6 05 9b
> 2021-05-04T02:28:21.051220+02:00 storage-t20 kernel: [1736623.921493] 
> RSP: 0018:ffffb93f412b3c28 EFLAGS: 00010282
> 2021-05-04T02:28:21.051234+02:00 storage-t20 kernel: [1736623.921494] 
> RAX: 0000000000000000 RBX: ffff9c2c913a0f80 RCX: 0000000000000027
> 2021-05-04T02:28:21.051236+02:00 storage-t20 kernel: [1736623.921495] 
> RDX: 0000000000000027 RSI: ffff9c2d39e18a00 RDI: ffff9c2d39e18a08
> 2021-05-04T02:28:21.051237+02:00 storage-t20 kernel: [1736623.921495] 
> RBP: ffff9c2c96e4f2a4 R08: 0000000000000000 R09: c0000000ffff7fff
> 2021-05-04T02:28:21.051238+02:00 storage-t20 kernel: [1736623.921496] 
> R10: 0000000000000001 R11: ffffb93f412b3a30 R12: ffff9c2c96e4f2a0
> 2021-05-04T02:28:21.051238+02:00 storage-t20 kernel: [1736623.921496] 
> R13: ffff9c2c375f5450 R14: ffff9c2cb4f9fde8 R15: ffffffff86f75300
> 2021-05-04T02:28:21.051244+02:00 storage-t20 kernel: [1736623.921497] 
> FS:  0000000000000000(0000) GS:ffff9c2d39e00000(0000) 
> knlGS:0000000000000000
> 2021-05-04T02:28:21.051245+02:00 storage-t20 kernel: [1736623.921498] 
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2021-05-04T02:28:21.051245+02:00 storage-t20 kernel: [1736623.921499] 
> CR2: 00007f424807d5b9 CR3: 0000000103084001 CR4: 00000000007706f0
> 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921522] 
> PKRU: 55555554
> 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921523] Call Trace:
> 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921546]  
> nfsd_break_deleg_cb+0xb5/0xc0 [nfsd]
> 2021-05-04T02:28:21.051247+02:00 storage-t20 kernel: [1736623.921553]  
> __break_lease+0x148/0x500
> 2021-05-04T02:28:21.051249+02:00 storage-t20 kernel: [1736623.921564]  
> ? fill_pre_wcc+0x8f/0x180 [nfsd]
> 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921566]  
> notify_change+0x196/0x4c0
> 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921575]  
> ? nfsd_setattr+0x2e6/0x470 [nfsd]
> 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921586]  
> nfsd_setattr+0x2e6/0x470 [nfsd]
> 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921597]  
> nfsd4_setattr+0x7b/0x140 [nfsd]
> 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921611]  
> nfsd4_proc_compound+0x355/0x680 [nfsd]
> 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921623]  
> nfsd_dispatch+0xd4/0x180 [nfsd]
> 2021-05-04T02:28:21.051253+02:00 storage-t20 kernel: [1736623.921661]  
> svc_process_common+0x390/0x6c0 [sunrpc]
> 2021-05-04T02:28:21.051253+02:00 storage-t20 kernel: [1736623.921680]  
> ? svc_recv+0x3c4/0x8a0 [sunrpc]
> 2021-05-04T02:28:21.051254+02:00 storage-t20 kernel: [1736623.921688]  
> ? nfsd_svc+0x300/0x300 [nfsd]
> 2021-05-04T02:28:21.051254+02:00 storage-t20 kernel: [1736623.921695]  
> ? nfsd_destroy+0x60/0x60 [nfsd]
> 2021-05-04T02:28:21.051255+02:00 storage-t20 kernel: [1736623.921710]  
> svc_process+0xb7/0xf0 [sunrpc]
> 2021-05-04T02:28:21.051255+02:00 storage-t20 kernel: [1736623.921734]  
> nfsd+0xe8/0x140 [nfsd]
> 2021-05-04T02:28:21.051257+02:00 storage-t20 kernel: [1736623.921737]  
> kthread+0x116/0x130
> 2021-05-04T02:28:21.051258+02:00 storage-t20 kernel: [1736623.921738]  
> ? kthread_park+0x80/0x80
> 2021-05-04T02:28:21.051258+02:00 storage-t20 kernel: [1736623.921741]  
> ret_from_fork+0x1f/0x30
> 2021-05-04T02:28:21.051259+02:00 storage-t20 kernel: [1736623.921743] 
> ---[ end trace f6e153631af275dc ]---
> 
> It is followed by:
> 2021-05-04T02:28:21.101162+02:00 storage-t20 kernel: [1736623.971161] list_add corruption. prev->next should be next (ffff9c2d0875ecb8), but was ffff9c2c913a0fe8. (prev=ffff9c2c913a0fe8).
> 2021-05-04T02:28:21.101176+02:00 storage-t20 kernel: [1736623.971315] 
> ------------[ cut here ]------------
> 2021-05-04T02:28:21.101177+02:00 storage-t20 kernel: [1736623.971317] kernel BUG at lib/list_debug.c:28!
> 2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971362] invalid opcode: 0000 [#1] SMP NOPTI
> 2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971402] CPU: 1 PID: 2435711 Comm: kworker/u256:5 Tainted: G        W         5.10.0-0.bpo.4-amd64 #1 Debian 5.10.19-1~bpo10+1
> 2021-05-04T02:28:21.101179+02:00 storage-t20 kernel: [1736623.971456] 
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop 
> Reference Platform, BIOS 6.00 12/12/2018
> 2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971499] 
> Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> 2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971515] 
> RIP: 0010:__list_add_valid.cold.0+0x26/0x28
> 2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971527] 
> Code: 7b 1c bf ff 48 89 d1 48 c7 c7 18 71 92 86 48 89 c2 e8 02 2a ff 
> ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 70 71 92 86 e8 ee 29 ff ff <0f> 0b 
> 48 89 fe 48 89 c2 48 c7 c7 00 72 92 86 e8 da 29 ff ff 0f 0b
> 2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971564] 
> RSP: 0018:ffffb93f4075fe48 EFLAGS: 00010246
> 2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971579] 
> RAX: 0000000000000075 RBX: ffff9c2c913a0fe8 RCX: 0000000000000000
> 2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971594] 
> RDX: 0000000000000000 RSI: ffff9c2d39e58a00 RDI: ffff9c2d39e58a00
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971608] 
> RBP: ffff9c2c913a1018 R08: 0000000000000000 R09: c0000000ffff7fff
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971623] 
> R10: 0000000000000001 R11: ffffb93f4075fc58 R12: ffff9c2d0875ec00
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971637] 
> R13: ffff9c2c913a0fe8 R14: ffff9c2d0875ecb8 R15: ffff9c2c913a1050
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971653] 
> FS:  0000000000000000(0000) GS:ffff9c2d39e40000(0000) 
> knlGS:0000000000000000
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971684] 
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971735] 
> CR2: 00007f8d98002698 CR3: 0000000103904005 CR4: 00000000007706e0
> 2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971774] 
> PKRU: 55555554
> 2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971781] Call Trace:
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971805]  
> nfsd4_cb_recall_prepare+0x2aa/0x2f0 [nfsd]
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971829]  
> nfsd4_run_cb_work+0xe9/0x150 [nfsd]
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971843]  
> process_one_work+0x1aa/0x340
> 2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971855]  
> ? create_worker+0x1a0/0x1a0
> 2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971865]  
> worker_thread+0x30/0x390
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.971875]  
> ? create_worker+0x1a0/0x1a0
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972279]  
> kthread+0x116/0x130
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972663]  
> ? kthread_park+0x80/0x80
> 2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973043]  
> ret_from_fork+0x1f/0x30
> 2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973411] 
> Modules linked in: binfmt_misc vsock_loopback 
> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock 
> intel_rapl_msr intel_rapl_common nfit libnvdimm crc32_pclmul 
> ghash_clmulni_intel aesni_intel libaes crypto_simd cryptd glue_helper 
> rapl vmw_balloon vmwgfx joydev evdev serio_raw pcspkr ttm sg 
> drm_kms_helper vmw_vmci cec ac button nfsd auth_rpcgss nfs_acl lockd 
> grace drm sunrpc fuse configfs ip_tables x_tables autofs4 btrfs 
> blake2b_generic xor raid6_pq libcrc32c crc32c_generic dm_mod sd_mod 
> t10_pi crc_t10dif crct10dif_generic ata_generic crct10dif_pclmul 
> crct10dif_common crc32c_intel psmouse vmxnet3 ata_piix libata 
> vmw_pvscsi scsi_mod i2c_piix4
> 2021-05-04T02:28:21.101190+02:00 storage-t20 kernel: [1736623.976175] 
> ---[ end trace f6e153631af275dd ]---
> 
> We are running a VMware vSphere platform running 9 groups of virtual machines. Each group include a VM with NFS for file sharing and 3 VM with NFS clients, so we are running 9 independent file servers. This issue occured on 2 different file servers with the same kernel version and the same error. There is no direct link between the two servers except the fact they are running the same software, on the same hadware for the same pupose.
> 
> It also occured earlier 4 times on 3 different servers which was running kernel 5.10.13-1~bpo10+1 (package linux-image-5.10.0-0.bpo.3-amd64).

>From the above report I suspect you have not a easy way to trigger the issue right? Did you see the issue as well with the most current version in buster-backports, 5.10.24-1~bpo10+1.

Ideally I think this issue should be just be forwarded upstream, but keep us in the loop accordingly, could you do that?

I did not not found immediately something similarly on https://lore.kernel.org/linux-nfs/ (a recent one about doing inter-server copy, but that is/looks different here).

So this would be, mailing

"J. Bruce Fields" <bfields@fieldses.org> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) Chuck Lever <chuck.lever@oracle.com> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) linux-nfs@vger.kernel.org (open list:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) linux-kernel@vger.kernel.org (open list)

Regards,
Salvatore


Reply to: