Bug#867595: nfs-kernel-server: nfsd freezes if underlying ext4 file system becomes full
Package: nfs-kernel-server
Version: 1:1.2.8-9
Severity: critical
Justification: breaks the whole system
Dear Maintainer,
after writing to an NFS which make the underlying ext4 file system full
(no more available space), nfsd crashes with this trace:
Mai 30 02:50:16 itmserver2 kernel: INFO: task nfsd:4242 blocked for more
than 120 seconds.
Mai 30 02:50:16 itmserver2 kernel: Not tainted 3.16.0-4-amd64 #1
Mai 30 02:50:16 itmserver2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mai 30 02:50:16 itmserver2 kernel: nfsd D ffff8807f0202f780 4242 2 0x00000000
Mai 30 02:50:16 itmserver2 kernel: ffff8807f0202b20 0000000000000046 0000000000012f40 ffff8807ae473fd8
Mai 30 02:50:16 itmserver2 kernel: 0000000000012f40 ffff8807f0202b20 ffff8803dd0b4940 ffff8807ae473a80
Mai 30 02:50:16 itmserver2 kernel: ffff8803dd0b4944 ffff8807f0202b20 00000000ffffffff ffff8803dd0b4948
Mai 30 02:50:16 itmserver2 kernel: Call Trace:
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81517aa5>] ?schedule_preempt_disabled+0x25/0x70
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81519503>] ?__mutex_lock_slowpath+0xd3/0x1c0
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff8151960b>] ?mutex_lock+0x1b/0x2a
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa02d7ac9>] ?ext4_file_write_iter+0x79/0x3a0 [ext4]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81190384>] ?cache_grow+0x154/0x240
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff811aa420>] ?new_sync_read+0xa0/0xa0
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff811aa51f>] ?do_iter_readv_writev+0x5f/0x90
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff811aba7b>] ?do_readv_writev+0xbb/0x240
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa02d7a50>] ?ext4_unwritten_wait+0xa0/0xa0 [ext4]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa02d7a50>] ?ext4_unwritten_wait+0xa0/0xa0 [ext4]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff8121ff55>] ?exportfs_decode_fh+0x95/0x2c0
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff8108faff>] ?groups_alloc+0x2f/0xe0
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa06298b9>] ?nfsd_vfs_write.isra.12+0x99/0x360 [nfsd]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa062cf59>] ?nfsd_write+0x89/0x110 [nfsd]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa06371ef>] ?nfsd4_write+0x1bf/0x220 [nfsd]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa0638b18>] ?nfsd4_proc_compound+0x4e8/0x7e0 [nfsd]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa0625d32>] ?nfsd_dispatch+0xb2/0x200 [nfsd]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa05d5d71>] ?svc_process_common+0x451/0x6e0 [sunrpc]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa0625630>] ?nfsd_destroy+0x70/0x70 [nfsd]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa05d610c>] ?svc_process+0x10c/0x160 [sunrpc]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa06256ef>] ?nfsd+0xbf/0x130 [nfsd]
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff810894fd>] ?kthread+0xbd/0xe0
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81089440>] ?kthread_create_on_node+0x180/0x180
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff8151ad98>] ?ret_from_fork+0x58/0x90
Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81089440>] ?kthread_create_on_node+0x180/0x180
The filesystem in question:
dumpe2fs 1.42.12 (29-Aug-2014)
Filesystem volume name: space
Last mounted on: /srv/space
Filesystem UUID: ba1e6b57-d1ae-458f-850c-32894eb37a97
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 536870912
Block count: 4294967295
Reserved block count: 375809
Free blocks: 17076270
Free inodes: 533985328
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 4096
Inode blocks per group: 256
RAID stride: 128
RAID stripe width: 1280
Flex block group size: 16
Filesystem created: Wed May 27 13:50:23 2015
Last mount time: Tue May 30 09:50:51 2017
Last write time: Tue May 30 09:50:51 2017
Mount count: 1
Maximum mount count: -1
Last checked: Tue May 30 09:33:30 2017
Check interval: 0 (<none>)
Lifetime writes: 219 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: b2a64a16-0afc-40aa-8a92-a5a65fca1d5a
Journal backup: inode blocks
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length: 32768
Journal sequence: 0x007ab78e
Journal start: 31676
Please notice the size of 16 TB which is the maximum with 32 Bit and
block size of 4096. Maybe this is important.
The filesystem was created on a logical volume
--- Logical volume ---
LV Path /dev/md0_VG/space
LV Name space
VG Name md0_VG
LV UUID 63JVuX-yu5E-qsrw-zCP0-X0S8-Erkv-ZYTxwk
LV Write Access read/write
LV Creation host, time itmserver2, 2015-05-27 13:23:24 +0200
LV Status available
# open 1
LV Size 17.00 TiB
Current LE 34816
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 20480
Block device 254:1
and resized serveral times online. The LV is part of a VG
--- Volume group ---
VG Name md0_VG
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 59
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 5
Open LV 5
Max PV 0
Cur PV 1
Act PV 1
VG Size 36.39 TiB
PE Size 512.00 MiB
Total PE 74517
Alloc PE / Size 52324 / 25.55 TiB
Free PE / Size 22193 / 10.84 TiB
VG UUID ARRCBo-se6D-zNRI-Uz6m-f3Bc-0D2L-PdHGrO
which consists of only one physical volume
--- Physical volume ---
PV Name /dev/md0
VG Name md0_VG
PV Size 36.39 TiB / not usable 491.00 MiB
Allocatable yes
PE Size 512.00 MiB
Total PE 74517
Free PE 22193
Allocated PE 52324
PV UUID 7edwwI-7bVb-dDN8-wdXc-Je9R-J5Kn-KvPaXF
This physical volume is a RAID 6.
Our NFS server uses Debian jessie (old stable) with kernel
Linux itmserver2 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2 (2017-04-30) x86_64 GNU/Linux
fter the freeze, the system was rendered unstable with weird behavior
when accessing the file system.
Restarting the NFSD service didn't help and unloading the NFS kernel
module wasn't possible since it was still in use by NFS Daemons which
were unkillable. Also the command lsof did not respond.
Only a restart of the server was possible at
this time, why the bug is reported as critical. Before the restart, we
noticed high CPU of the process kworker/0:3 and high I/O (we used iotop)
of jdb2/sda5-8. Notice that /dev/sda5 holds only the /var/ directory and
is a completely separate hard disk as the NFS mount.
After the restart we checked the file system:
itmserver2:~ # fsck -f /dev/md0_VG/space
fsck from util-linux 2.25.2
e2fsck 1.42.12 (29-Aug-2014)
space: recovering journal
Clearing orphaned inode 253823871 (uid=3144, gid=3100, mode=0100640,
size=233609)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (13130177, counted=17076270).
Fix<y>? yes
Free inodes count wrong (533998561, counted=533985328).
Fix<y>? yes
The freeze happened two times so far. Since it was a productive system
which rendered many clients unusable, we did not had time to analyse the
problem deeper but restarted the server instead. Each time, a user
reported that a script on a client has written large quantities of data to the NFS
mount. Indeed, the file system was _almost_ full upon inspection (0.5 %
free but the reserved blocks are only 0.01 %), so it
may has reached its limit and then aborted writing the last file
resulting in an almost full file system? It is assumed that the freeze
of nfsd happened when the filesystem limit was reached.
The expected outcome would be to raise an I/O error because of the full
file system instead of freezing.
-- Package-specific info:
-- rpcinfo --
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100021 1 udp 44927 nlockmgr
100021 3 udp 44927 nlockmgr
100021 4 udp 44927 nlockmgr
100021 1 tcp 56490 nlockmgr
100021 3 tcp 56490 nlockmgr
100021 4 tcp 56490 nlockmgr
100007 2 udp 811 ypbind
100007 1 udp 811 ypbind
100007 2 tcp 812 ypbind
100007 1 tcp 812 ypbind
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 2 tcp 2049
100227 3 tcp 2049
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 2 udp 2049
100227 3 udp 2049
100005 1 udp 59788 mountd
100005 1 tcp 46566 mountd
100005 2 udp 35896 mountd
100005 2 tcp 52752 mountd
100005 3 udp 59974 mountd
100005 3 tcp 46390 mountd
100024 1 udp 53921 status
100024 1 tcp 59277 status
-- /etc/default/nfs-kernel-server --
RPCNFSDCOUNT=8
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS="--manage-gids"
NEED_SVCGSSD=""
RPCSVCGSSDOPTS=""
-- /etc/exports --
[removed for data security reasons]
-- /proc/fs/nfs/exports --
# Version 1.1
# Path Client(Flags) # IPs
[removed for data security reasons]
-- System Information:
Debian Release: 8.8
APT prefers oldstable-updates
APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages nfs-kernel-server depends on:
ii libblkid1 2.25.2-6
ii libc6 2.19-18+deb8u10
ii libcap2 1:2.24-8
ii libsqlite3-0 3.8.7.1-1+deb8u2
ii libtirpc1 0.2.5-1+deb8u1
ii libwrap0 7.6.q-25
ii lsb-base 4.1+Debian13+nmu1
ii nfs-common 1:1.2.8-9
ii ucf 3.0030
nfs-kernel-server recommends no packages.
nfs-kernel-server suggests no packages.
-- no debconf information
Reply to: