Bug#867595: nfs-kernel-server: nfsd freezes if underlying ext4 file system becomes full

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#867595: nfs-kernel-server: nfsd freezes if underlying ext4 file system becomes full
From: Dennis Grunert <itmadm@itm.uni-stuttgart.de>
Date: Fri, 07 Jul 2017 17:56:02 +0200
Message-id: <[🔎] 20170707155602.19175.93956.reportbug@itmserver2.itm.uni-stuttgart.de>
Reply-to: Dennis Grunert <itmadm@itm.uni-stuttgart.de>, 867595@bugs.debian.org
Package: nfs-kernel-server
Version: 1:1.2.8-9
Severity: critical
Justification: breaks the whole system

Dear Maintainer,

after writing to an NFS which make the underlying ext4 file system full
(no more available space), nfsd crashes with this trace:

Mai 30 02:50:16 itmserver2 kernel: INFO: task nfsd:4242 blocked for more
than 120 seconds.
Mai 30 02:50:16 itmserver2 kernel:       Not tainted 3.16.0-4-amd64 #1
Mai 30 02:50:16 itmserver2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mai 30 02:50:16 itmserver2 kernel: nfsd            D ffff8807f0202f780  4242      2 0x00000000
Mai 30 02:50:16 itmserver2 kernel:  ffff8807f0202b20 0000000000000046 0000000000012f40 ffff8807ae473fd8
Mai 30 02:50:16 itmserver2 kernel:  0000000000012f40 ffff8807f0202b20 ffff8803dd0b4940 ffff8807ae473a80
Mai 30 02:50:16 itmserver2 kernel:  ffff8803dd0b4944 ffff8807f0202b20 00000000ffffffff ffff8803dd0b4948
Mai 30 02:50:16 itmserver2 kernel: Call Trace:
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff81517aa5>] ?schedule_preempt_disabled+0x25/0x70
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff81519503>] ?__mutex_lock_slowpath+0xd3/0x1c0
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff8151960b>] ?mutex_lock+0x1b/0x2a
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa02d7ac9>] ?ext4_file_write_iter+0x79/0x3a0 [ext4]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff81190384>] ?cache_grow+0x154/0x240
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff811aa420>] ?new_sync_read+0xa0/0xa0
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff811aa51f>] ?do_iter_readv_writev+0x5f/0x90
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff811aba7b>] ?do_readv_writev+0xbb/0x240
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa02d7a50>] ?ext4_unwritten_wait+0xa0/0xa0 [ext4]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa02d7a50>] ?ext4_unwritten_wait+0xa0/0xa0 [ext4]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff8121ff55>] ?exportfs_decode_fh+0x95/0x2c0
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff8108faff>] ?groups_alloc+0x2f/0xe0
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa06298b9>] ?nfsd_vfs_write.isra.12+0x99/0x360 [nfsd]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa062cf59>] ?nfsd_write+0x89/0x110 [nfsd]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa06371ef>] ?nfsd4_write+0x1bf/0x220 [nfsd]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa0638b18>] ?nfsd4_proc_compound+0x4e8/0x7e0 [nfsd]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa0625d32>] ?nfsd_dispatch+0xb2/0x200 [nfsd]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa05d5d71>] ?svc_process_common+0x451/0x6e0 [sunrpc]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa0625630>] ?nfsd_destroy+0x70/0x70 [nfsd]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa05d610c>] ?svc_process+0x10c/0x160 [sunrpc]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffffa06256ef>] ?nfsd+0xbf/0x130 [nfsd]
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff810894fd>] ?kthread+0xbd/0xe0
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff81089440>] ?kthread_create_on_node+0x180/0x180
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff8151ad98>] ?ret_from_fork+0x58/0x90
Mai 30 02:50:16 itmserver2 kernel:  [<ffffffff81089440>] ?kthread_create_on_node+0x180/0x180

The filesystem in question:

dumpe2fs 1.42.12 (29-Aug-2014)
Filesystem volume name:   space
Last mounted on:          /srv/space
Filesystem UUID:          ba1e6b57-d1ae-458f-850c-32894eb37a97
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              536870912
Block count:              4294967295
Reserved block count:     375809
Free blocks:              17076270
Free inodes:              533985328
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         4096
Inode blocks per group:   256
RAID stride:              128
RAID stripe width:        1280
Flex block group size:    16
Filesystem created:       Wed May 27 13:50:23 2015
Last mount time:          Tue May 30 09:50:51 2017
Last write time:          Tue May 30 09:50:51 2017
Mount count:              1
Maximum mount count:      -1
Last checked:             Tue May 30 09:33:30 2017
Check interval:           0 (<none>)
Lifetime writes:          219 TB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      b2a64a16-0afc-40aa-8a92-a5a65fca1d5a
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x007ab78e
Journal start:            31676

Please notice the size of 16 TB which is the maximum with 32 Bit and
block size of 4096. Maybe this is important.
The filesystem was created on a logical volume

--- Logical volume ---
LV Path                /dev/md0_VG/space
LV Name                space
VG Name                md0_VG
LV UUID                63JVuX-yu5E-qsrw-zCP0-X0S8-Erkv-ZYTxwk
LV Write Access        read/write
LV Creation host, time itmserver2, 2015-05-27 13:23:24 +0200
LV Status              available
# open                 1
LV Size                17.00 TiB
Current LE             34816
Segments               2
Allocation             inherit
Read ahead sectors     auto
- currently set to     20480
Block device           254:1

and resized serveral times online. The LV is part of a VG

--- Volume group ---
VG Name               md0_VG
System ID             
Format                lvm2
Metadata Areas        1
Metadata Sequence No  59
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                5
Open LV               5
Max PV                0
Cur PV                1
Act PV                1
VG Size               36.39 TiB
PE Size               512.00 MiB
Total PE              74517
Alloc PE / Size       52324 / 25.55 TiB
Free  PE / Size       22193 / 10.84 TiB
VG UUID               ARRCBo-se6D-zNRI-Uz6m-f3Bc-0D2L-PdHGrO

which consists of only one physical volume

--- Physical volume ---
PV Name               /dev/md0
VG Name               md0_VG
PV Size               36.39 TiB / not usable 491.00 MiB
Allocatable           yes 
PE Size               512.00 MiB
Total PE              74517
Free PE               22193
Allocated PE          52324
PV UUID               7edwwI-7bVb-dDN8-wdXc-Je9R-J5Kn-KvPaXF

This physical volume is a RAID 6.

Our NFS server uses Debian jessie (old stable) with kernel

Linux itmserver2 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2 (2017-04-30) x86_64 GNU/Linux

fter the freeze, the system was rendered unstable with weird behavior
when accessing the file system.
Restarting the NFSD service didn't help and unloading the NFS kernel
module wasn't possible since it was still in use by NFS Daemons which 
were unkillable. Also the command lsof did not respond.
Only a restart of the server was possible at
this time, why the bug is reported as critical. Before the restart, we
noticed high CPU of the process kworker/0:3 and high I/O (we used iotop)
of jdb2/sda5-8. Notice that /dev/sda5 holds only the /var/ directory and
is a completely separate hard disk as the NFS mount.

After the restart we checked the file system:

itmserver2:~ # fsck -f /dev/md0_VG/space
fsck from util-linux 2.25.2
e2fsck 1.42.12 (29-Aug-2014)
space: recovering journal
Clearing orphaned inode 253823871 (uid=3144, gid=3100, mode=0100640,
size=233609)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (13130177, counted=17076270).
Fix<y>? yes
Free inodes count wrong (533998561, counted=533985328).
Fix<y>? yes

The freeze happened two times so far. Since it was a productive system
which rendered many clients unusable, we did not had time to analyse the
problem deeper but restarted the server instead. Each time, a user
reported that a script on a client has written large quantities of data to the NFS
mount. Indeed, the file system was _almost_ full upon inspection (0.5 %
free but the reserved blocks are only 0.01 %), so it
may has reached its limit and then aborted writing the last file
resulting in an almost full file system? It is assumed that the freeze
of nfsd happened when the filesystem limit was reached.

The expected outcome would be to raise an I/O error because of the full
file system instead of freezing.

-- Package-specific info:
-- rpcinfo --
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100021    1   udp  44927  nlockmgr
    100021    3   udp  44927  nlockmgr
    100021    4   udp  44927  nlockmgr
    100021    1   tcp  56490  nlockmgr
    100021    3   tcp  56490  nlockmgr
    100021    4   tcp  56490  nlockmgr
    100007    2   udp    811  ypbind
    100007    1   udp    811  ypbind
    100007    2   tcp    812  ypbind
    100007    1   tcp    812  ypbind
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    2   tcp   2049
    100227    3   tcp   2049
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100227    2   udp   2049
    100227    3   udp   2049
    100005    1   udp  59788  mountd
    100005    1   tcp  46566  mountd
    100005    2   udp  35896  mountd
    100005    2   tcp  52752  mountd
    100005    3   udp  59974  mountd
    100005    3   tcp  46390  mountd
    100024    1   udp  53921  status
    100024    1   tcp  59277  status
-- /etc/default/nfs-kernel-server --
RPCNFSDCOUNT=8
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS="--manage-gids"
NEED_SVCGSSD=""
RPCSVCGSSDOPTS=""
-- /etc/exports --
[removed for data security reasons]
-- /proc/fs/nfs/exports --
# Version 1.1
# Path Client(Flags) # IPs
[removed for data security reasons]

-- System Information:
Debian Release: 8.8
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages nfs-kernel-server depends on:
ii  libblkid1     2.25.2-6
ii  libc6         2.19-18+deb8u10
ii  libcap2       1:2.24-8
ii  libsqlite3-0  3.8.7.1-1+deb8u2
ii  libtirpc1     0.2.5-1+deb8u1
ii  libwrap0      7.6.q-25
ii  lsb-base      4.1+Debian13+nmu1
ii  nfs-common    1:1.2.8-9
ii  ucf           3.0030

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

-- no debconf information
Reply to:
Prev by Date: Re: linux_4.9.30-2+deb9u2~bpo8+1_multi.changes REJECTED
Next by Date: Re: linux_4.9.30-2+deb9u2~bpo8+1_multi.changes REJECTED
Previous by thread: Re: linux_4.9.30-2+deb9u2~bpo8+1_multi.changes REJECTED
Next by thread: Bug#865303: Java JNI crashes post CVE-2017-1000364 fixes post 4.9.30-2+deb9u2
Index(es):
- Date
- Thread