Bug#880549: nfs-kernel-server: NFSv4 sec=krb5p option broken on stretch
Package: nfs-kernel-server
Version: 1:1.3.4-2.1
Severity: important
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I have recently upgraded my NFS server from Jessie to Stretch. It was
the last system on my network that needed to be upgraded. After the
upgrade, I found that NFSv4 was nearly unusable. All user home
directories, as well as backup and scratch space, are mounted via NFS.
Users trying to log in to a default GNOME session would experience hangs
that appeared to never resolve themselves. Logging in through ssh
resulted in ~2 minutes delay before a bash prompt would appear.
I attempted several things to troubleshoot the problem, including
reverting to kernel 3.16.0 from Jessie and nfs-utils 1.2.8-9.1. I tried
going all the way back to nfs-utils 1.2.8-9, but the version of systemd
in Stretch has "nfs-common (<< 1:1.2.8-9.1)". Even after reverting the
problem persisted. Based on that, I think that the problem may have
been introduced as far back as 1.2.8-9.1.
The problem also manifested itself in the form of problems for users
logged directly into the NFS server. Even though file accesses are
local in that scenario, if an NFS client had attemped to access a file
and hung on it, then it would also cause problems on the NFS server.
For example, I have a large MBOX file that I access with mutt. If I
access the file from an NFS client then it appears to hang with messages
like this flooding syslog on the client:
vmunix: [918369.611122] NFS: nfs4_reclaim_open_state: Lock reclaim failed
Trying to then open that same mbox with mutt on the NFS server causes
mutt to segfault because the call to fcntl() returns an error.
The only thing that resolved the problem was switching my exports from
sec=krb5p to sec=krb5i. Of course, this is considerably less desirable
than krb5p in some cases.
I believe the that problem is with rpc.svcgssd. Here is part of what
got captured in syslog during a failure:
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: using domain: example.com
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: Realms list: 'EXAMPLE.COM'
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: processing 'Method' list
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so for method nsswitch
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: sname = nfs/ws1.example.com@EXAMPLE.COM
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509615211 (25678 from now), clnt: nfs@ws1.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example.com@EXAMPLE.COM
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14448 from now), clnt: nfs@ws2.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example.com@EXAMPLE.COM
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13881 from now), clnt: nfs@ws3.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example.com@EXAMPLE.COM
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13756 from now), clnt: nfs@ws3.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example.com@EXAMPLE.COM
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13756 from now), clnt: nfs@ws3.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example.com@EXAMPLE.COM
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14280 from now), clnt: nfs@ws2.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: sname = roberto@EXAMPLE.COM
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509623547 (33835 from now), clnt: <null>, uid: 3167, gid: 100, num aux grps: 17:
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 1) 100
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 2) 4
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 3) 24
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 4) 29
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 5) 33
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 6) 44
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 7) 46
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 8) 50
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 9) 60
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 10) 113
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 11) 115
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 12) 125
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 13) 132
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 14) 500
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 15) 501
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 16) 502
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 17) 503
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example.com@EXAMPLE.COM
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14260 from now), clnt: nfs@ws2.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:32:42 nfssrv rpc.svcgssd[30152]: exiting on signal 15
- -- Package-specific info:
- -- rpcinfo --
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
391002 2 tcp 650 sgi_fam
100005 1 udp 32767 mountd
100005 1 tcp 32767 mountd
100005 2 udp 32767 mountd
100005 2 tcp 32767 mountd
100005 3 udp 32767 mountd
100005 3 tcp 32767 mountd
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 3 udp 2049
100021 1 udp 52460 nlockmgr
100021 3 udp 52460 nlockmgr
100021 4 udp 52460 nlockmgr
100021 1 tcp 37351 nlockmgr
100021 3 tcp 37351 nlockmgr
100021 4 tcp 37351 nlockmgr
- -- /etc/default/nfs-kernel-server --
RPCNFSDCOUNT=8
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS="-p 32767"
NEED_SVCGSSD="yes"
RPCSVCGSSDOPTS=""
- -- /etc/exports --
/srv/nfs 192.168.0.0/16(sync,rw,fsid=0,crossmnt,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/home 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/backup 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/scratch 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
- -- System Information:
Debian Release: 9.2
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.9.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages nfs-kernel-server depends on:
ii init-system-helpers 1.48
ii keyutils 1.5.9-9
ii libblkid1 2.29.2-1
ii libc6 2.24-11+deb9u1
ii libcap2 1:2.25-1
ii libsqlite3-0 3.16.2-5
ii libtirpc1 0.2.5-1.2
ii libwrap0 7.6.q-26
ii lsb-base 9.20161125
ii netbase 5.4
ii nfs-common 1:1.3.4-2.1
ii ucf 3.0036
nfs-kernel-server recommends no packages.
nfs-kernel-server suggests no packages.
- -- no debconf information
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEz9ERzDttUsU/BH8iLNd4Xt2nsg8FAln6mi4ACgkQLNd4Xt2n
sg+OPRAAksDf+IR6dTn1Gl8OM4IOMVi5S48i+S6n3sDy9Uv04zRX1sdvTuzvGF1d
4mRqPson+eQ3CnuMQIH8XOMZJdrZoB7bsjMWjZC9FgBn42oE7wOxcMJW+FC1qCYe
AJMan4ssGDec62XnBkkZY2byhvbXygEmSq/cCQlCKp3amwe3VEm+lqf1qFKl2euV
qrnkgk7aAgrH6zu+q5WgAyrEZzvDi7NbcGheUe0NitAKEu2zR+Iq6mMOEP9rd91s
sgJ2tCHvwZtE6FoUHk9GJrluZyzItBIGV+aaA9PFjJcoQgxB5mE5cS0vFlfXiVbm
xS9v67UJXp/H7R5S+mnoMDqfKi+TCW+FaLEj5s+4bpYd7sk6Pyqugj3yEj6OYKAw
u2G7iB4MA64roxGeIEghgPvW/Zmuvpa+pi6RjOUcyvO4lyNGMVUM7TP59fWuXbwv
kwQ+coBI6SxAUnKOiA9MADHZcUxjdhPWckkTFagijv9mipAY9+G6+fWBY3iFZrMO
vrGABHecEzV5TzKv4J+sSGl5581/+vImK3SIUA6ILBED1SkWmFFonekE9P0YDgFr
tR2+vcCvhQehYSQW/hifJpbY/3puUXYOvqQmv+Be3AVw6hBaxVfzY1yjkDemjKXr
oHmDZ5L76TokKoiv45JYIWAr8sjNsxScDUlhnP+Ou7Tvi8eUIqM=
=CbNN
-----END PGP SIGNATURE-----
Reply to: