[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1071501: linux-image-6.1.0-21-arm64: Linux NFS client hangs in nfs4_lookup_revalidate



Package: src:linux
Version: 6.1.90-1
Severity: normal
X-Debbugs-Cc: richard+debian+bugreport@kojedz.in

Dear Maintainer,

I am running kubernetes on debian, and pods are mounting multiple nfs
shares. I am running dovecot processes in PODs, which receive mails from
the internet, and also serves as imap server for clients. I am
monitoring my mail system by sending mails periodically (15 seconds) and
also downloading them via imap. I found a few times that some dovecot process
stuck in D state, a reboot was always needed to recover from that state.

Unfortunately, I was not able to trigger the bug really fast, I dont
really know what operations does dovecot issue and in what order to trigger
this behavior. So until I get closer, I've set up a similar, but smaller
environment with just a single dovecot process, and it also does the
same work, delivering only test mails locally, and serving them via imap
to the monitoring client, storing everything on NFS. Fortunately, this also
triggers the bug, after a few hours one of the dovecot processes is stuck
in D state. Kernel also shows blocked state:

May 19 12:16:49 k8s-node07 kernel: INFO: task lmtp:665683 blocked for more than 120 seconds.
May 19 12:16:49 k8s-node07 kernel:       Not tainted 6.1.0-21-arm64 #1 Debian 6.1.90-1
May 19 12:16:49 k8s-node07 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 19 12:16:49 k8s-node07 kernel: task:lmtp            state:D stack:0     pid:665683 ppid:2881   flags:0x00000000
May 19 12:16:49 k8s-node07 kernel: Call trace:
May 19 12:16:49 k8s-node07 kernel:  __switch_to+0xf0/0x170
May 19 12:16:49 k8s-node07 kernel:  __schedule+0x340/0x940
May 19 12:16:49 k8s-node07 kernel:  schedule+0x58/0xf0
May 19 12:16:49 k8s-node07 kernel:  __nfs_lookup_revalidate+0x118/0x160 [nfs]
May 19 12:16:49 k8s-node07 kernel:  nfs4_lookup_revalidate+0x20/0x30 [nfs]
May 19 12:16:49 k8s-node07 kernel:  lookup_fast+0x138/0x150
May 19 12:16:49 k8s-node07 kernel:  walk_component+0x30/0x1a0
May 19 12:16:49 k8s-node07 kernel:  path_lookupat+0x80/0x1a4
May 19 12:16:49 k8s-node07 kernel:  filename_lookup+0xb4/0x1b0
May 19 12:16:49 k8s-node07 kernel:  vfs_statx+0x94/0x19c
May 19 12:16:49 k8s-node07 kernel:  vfs_fstatat+0x68/0x90
May 19 12:16:49 k8s-node07 kernel:  __do_sys_newfstatat+0x58/0xa0
May 19 12:16:49 k8s-node07 kernel:  __arm64_sys_newfstatat+0x28/0x34
May 19 12:16:49 k8s-node07 kernel:  invoke_syscall+0x78/0x100
May 19 12:16:49 k8s-node07 kernel:  el0_svc_common.constprop.0+0x4c/0xf4
May 19 12:16:49 k8s-node07 kernel:  do_el0_svc+0x34/0xd0
May 19 12:16:49 k8s-node07 kernel:  el0_svc+0x34/0xd4
May 19 12:16:49 k8s-node07 kernel:  el0t_64_sync_handler+0xf4/0x120
May 19 12:16:49 k8s-node07 kernel:  el0t_64_sync+0x18c/0x190

Or, for another process:

May 20 04:50:01 k8s-node07 kernel: INFO: task imap:8337 blocked for more than 120 seconds.
May 20 04:50:01 k8s-node07 kernel:       Not tainted 6.1.0-21-arm64 #1 Debian 6.1.90-1
May 20 04:50:01 k8s-node07 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 20 04:50:01 k8s-node07 kernel: task:imap            state:D stack:0     pid:8337  ppid:3164   flags:0x00000000
May 20 04:50:01 k8s-node07 kernel: Call trace:
May 20 04:50:01 k8s-node07 kernel:  __switch_to+0xf0/0x170
May 20 04:50:01 k8s-node07 kernel:  __schedule+0x340/0x940
May 20 04:50:01 k8s-node07 kernel:  schedule+0x58/0xf0
May 20 04:50:01 k8s-node07 kernel:  __nfs_lookup_revalidate+0x118/0x160 [nfs]
May 20 04:50:01 k8s-node07 kernel:  nfs4_lookup_revalidate+0x20/0x30 [nfs]
May 20 04:50:01 k8s-node07 kernel:  lookup_fast+0x138/0x150
May 20 04:50:01 k8s-node07 kernel:  walk_component+0x30/0x1a0
May 20 04:50:01 k8s-node07 kernel:  path_lookupat+0x80/0x1a4
May 20 04:50:01 k8s-node07 kernel:  filename_lookup+0xb4/0x1b0
May 20 04:50:01 k8s-node07 kernel:  vfs_statx+0x94/0x19c
May 20 04:50:01 k8s-node07 kernel:  vfs_fstatat+0x68/0x90
May 20 04:50:01 k8s-node07 kernel:  __do_sys_newfstatat+0x58/0xa0
May 20 04:50:01 k8s-node07 kernel:  __arm64_sys_newfstatat+0x28/0x34
May 20 04:50:01 k8s-node07 kernel:  invoke_syscall+0x78/0x100
May 20 04:50:01 k8s-node07 kernel:  el0_svc_common.constprop.0+0x4c/0xf4
May 20 04:50:01 k8s-node07 kernel:  do_el0_svc+0x34/0xd0
May 20 04:50:01 k8s-node07 kernel:  el0_svc+0x34/0xd4
May 20 04:50:01 k8s-node07 kernel:  el0t_64_sync_handler+0xf4/0x120
May 20 04:50:01 k8s-node07 kernel:  el0t_64_sync+0x18c/0x190


Of course the NFS server is running, and other NFS mounts are still
working from the node. Also, this started to happen with Debian's
kernel. Before that, I was compiling my own upstream kernel, version
5.15. With that, I've never experienced such a lockup.

Unfortunately, I dont know, how to go further, how shall I collect more
relevant debugging information.

I expect thet dovecot is just an application, which should not cause any
kernel-side lockups. In my test lab, this specific NFS mount is just
mounted on one machine, so it really suggests me a linux nfs-client side
issue, not related to caching coherency between multiple clients.

-- Package-specific info:
** Version:
Linux version 6.1.0-21-arm64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP Debian 6.1.90-1 (2024-05-03)

** Command line:
net.ifnames=0 console=ttyS2,1500000 console=tty1 root=UUID=b4ff4167-1fe9-4fd6-9b9c-c3c68d98108b rw rootwait panic=10

** Not tainted

** Kernel log:
May 20 04:52:02 k8s-node07 kernel: INFO: task imap:8337 blocked for more than 241 seconds.
May 20 04:52:02 k8s-node07 kernel:       Not tainted 6.1.0-21-arm64 #1 Debian 6.1.90-1
May 20 04:52:02 k8s-node07 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 20 04:52:02 k8s-node07 kernel: task:imap            state:D stack:0     pid:8337  ppid:3164   flags:0x00000000
May 20 04:52:02 k8s-node07 kernel: Call trace:
May 20 04:52:02 k8s-node07 kernel:  __switch_to+0xf0/0x170
May 20 04:52:02 k8s-node07 kernel:  __schedule+0x340/0x940
May 20 04:52:02 k8s-node07 kernel:  schedule+0x58/0xf0
May 20 04:52:02 k8s-node07 kernel:  __nfs_lookup_revalidate+0x118/0x160 [nfs]
May 20 04:52:02 k8s-node07 kernel:  nfs4_lookup_revalidate+0x20/0x30 [nfs]
May 20 04:52:02 k8s-node07 kernel:  lookup_fast+0x138/0x150
May 20 04:52:02 k8s-node07 kernel:  walk_component+0x30/0x1a0
May 20 04:52:02 k8s-node07 kernel:  path_lookupat+0x80/0x1a4
May 20 04:52:02 k8s-node07 kernel:  filename_lookup+0xb4/0x1b0
May 20 04:52:02 k8s-node07 kernel:  vfs_statx+0x94/0x19c
May 20 04:52:02 k8s-node07 kernel:  vfs_fstatat+0x68/0x90
May 20 04:52:02 k8s-node07 kernel:  __do_sys_newfstatat+0x58/0xa0
May 20 04:52:02 k8s-node07 kernel:  __arm64_sys_newfstatat+0x28/0x34
May 20 04:52:02 k8s-node07 kernel:  invoke_syscall+0x78/0x100
May 20 04:52:02 k8s-node07 kernel:  el0_svc_common.constprop.0+0x4c/0xf4
May 20 04:52:02 k8s-node07 kernel:  do_el0_svc+0x34/0xd0
May 20 04:52:02 k8s-node07 kernel:  el0_svc+0x34/0xd4
May 20 04:52:02 k8s-node07 kernel:  el0t_64_sync_handler+0xf4/0x120
May 20 04:52:02 k8s-node07 kernel:  el0t_64_sync+0x18c/0x190

** Model information

** Loaded modules:
sd_mod
t10_pi
crc64_rocksoft_generic
crc64_rocksoft
crc_t10dif
crct10dif_generic
crc64
sg
iscsi_tcp
libiscsi_tcp
libiscsi
scsi_transport_iscsi
scsi_mod
scsi_common
nf_conntrack_netlink
rpcsec_gss_krb5
auth_rpcgss
nfsv4
dns_resolver
nfs
lockd
grace
fscache
netfs
nft_log
nft_limit
xt_limit
xt_NFLOG
nfnetlink_log
xt_physdev
xt_TCPMSS
xt_tcpudp
xt_mark
xt_multiport
xt_addrtype
dummy
ipt_REJECT
nf_reject_ipv4
ip_set_hash_ipport
nft_chain_nat
xt_nat
xt_MASQUERADE
xt_ipvs
nf_nat
xt_set
ip_set_hash_ip
ip_set_hash_net
ip_set
veth
xt_conntrack
xt_comment
nft_compat
nf_tables
nfnetlink
overlay
sunrpc
binfmt_misc
evdev
aes_ce_blk
snd_soc_rk817
aes_ce_cipher
polyval_ce
snd_soc_core
polyval_generic
snd_pcm_dmaengine
ext4
ghash_ce
gf128mul
sha2_ce
leds_gpio
snd_pcm
sha256_arm64
sha1_ce
rockchip_thermal
crc16
mbcache
snd_timer
jbd2
snd
dw_wdt
soundcore
rk817_charger
rk805_pwrkey
cpufreq_dt
br_netfilter
bridge
stp
llc
ip_vs_sh
ip_vs_wrr
ip_vs_rr
ip_vs
nf_conntrack
nf_defrag_ipv6
nf_defrag_ipv4
drm
loop
fuse
efi_pstore
dm_mod
dax
configfs
ip_tables
x_tables
autofs4
xfs
libcrc32c
crc32c_generic
realtek
rk808_regulator
fan53555
dwmac_rk
stmmac_platform
stmmac
pcs_xpcs
spi_rockchip
phylink
dw_mmc_rockchip
dw_mmc_pltfm
of_mdio
dw_mmc
fixed
crct10dif_ce
crct10dif_common
fixed_phy
fwnode_mdio
pl330
i2c_rk3x
io_domain
libphy

** PCI devices:
not available

** USB devices:
not available


-- System Information:
Debian Release: 12.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: arm64 (aarch64)

Kernel: Linux 6.1.0-21-arm64 (SMP w/4 CPU threads)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Versions of packages linux-image-6.1.0-21-arm64 depends on:
ii  initramfs-tools [linux-initramfs-tool]  0.142
ii  kmod                                    30+20221128-1
ii  linux-base                              4.9

Versions of packages linux-image-6.1.0-21-arm64 recommends:
ii  apparmor             3.0.8-3
ii  firmware-linux-free  20200122-1

Versions of packages linux-image-6.1.0-21-arm64 suggests:
pn  debian-kernel-handbook  <none>
pn  linux-doc-6.1           <none>

Versions of packages linux-image-6.1.0-21-arm64 is related to:
pn  firmware-amd-graphics     <none>
pn  firmware-atheros          <none>
pn  firmware-bnx2             <none>
pn  firmware-bnx2x            <none>
pn  firmware-brcm80211        <none>
pn  firmware-cavium           <none>
pn  firmware-intel-sound      <none>
pn  firmware-intelwimax       <none>
pn  firmware-ipw2x00          <none>
pn  firmware-ivtv             <none>
pn  firmware-iwlwifi          <none>
pn  firmware-libertas         <none>
pn  firmware-linux-nonfree    <none>
pn  firmware-misc-nonfree     <none>
pn  firmware-myricom          <none>
pn  firmware-netxen           <none>
pn  firmware-qlogic           <none>
pn  firmware-realtek          <none>
pn  firmware-samsung          <none>
pn  firmware-siano            <none>
pn  firmware-ti-connectivity  <none>
pn  xen-hypervisor            <none>

-- no debconf information


Reply to: