[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#880549: nfs-kernel-server: NFSv4 sec=krb5p option broken on stretch



Package: nfs-kernel-server
Version: 1:1.3.4-2.1
Severity: important

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

I have recently upgraded my NFS server from Jessie to Stretch.  It was
the last system on my network that needed to be upgraded.  After the
upgrade, I found that NFSv4 was nearly unusable.  All user home
directories, as well as backup and scratch space, are mounted via NFS.
Users trying to log in to a default GNOME session would experience hangs
that appeared to never resolve themselves.  Logging in through ssh
resulted in ~2 minutes delay before a bash prompt would appear.

I attempted several things to troubleshoot the problem, including
reverting to kernel 3.16.0 from Jessie and nfs-utils 1.2.8-9.1.  I tried
going all the way back to nfs-utils 1.2.8-9, but the version of systemd
in Stretch has "nfs-common (<< 1:1.2.8-9.1)".  Even after reverting the
problem persisted.  Based on that, I think that the problem may have
been introduced as far back as 1.2.8-9.1.

The problem also manifested itself in the form of problems for users
logged directly into the NFS server.  Even though file accesses are
local in that scenario, if an NFS client had attemped to access a file
and hung on it, then it would also cause problems on the NFS server.
For example, I have a large MBOX file that I access with mutt.  If I
access the file from an NFS client then it appears to hang with messages
like this flooding syslog on the client:

vmunix: [918369.611122] NFS: nfs4_reclaim_open_state: Lock reclaim failed

Trying to then open that same mbox with mutt on the NFS server causes
mutt to segfault because the call to fcntl() returns an error.

The only thing that resolved the problem was switching my exports from
sec=krb5p to sec=krb5i.  Of course, this is considerably less desirable
than krb5p in some cases.

I believe the that problem is with rpc.svcgssd.  Here is part of what
got captured in syslog during a failure:

Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: using domain: example.com
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: Realms list: 'EXAMPLE.COM' 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: processing 'Method' list
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so for method nsswitch 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: leaving poll 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: handling null request 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: sname = nfs/ws1.example.com@EXAMPLE.COM 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: doing downcall 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509615211 (25678 from now), clnt: nfs@ws1.example.com, uid: -1, gid: -1, num aux grps: 0: 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: sending null reply 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: finished handling null request 
Nov  1 22:25:33 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: leaving poll 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: handling null request 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example.com@EXAMPLE.COM 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: doing downcall 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14448 from now), clnt: nfs@ws2.example.com, uid: -1, gid: -1, num aux grps: 0: 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: sending null reply 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: finished handling null request 
Nov  1 22:25:34 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: leaving poll 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: handling null request 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example.com@EXAMPLE.COM 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: doing downcall 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13881 from now), clnt: nfs@ws3.example.com, uid: -1, gid: -1, num aux grps: 0: 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: sending null reply 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: finished handling null request 
Nov  1 22:26:12 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: leaving poll 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: handling null request 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example.com@EXAMPLE.COM 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: doing downcall 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13756 from now), clnt: nfs@ws3.example.com, uid: -1, gid: -1, num aux grps: 0: 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: sending null reply 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: finished handling null request 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: leaving poll 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: handling null request 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example.com@EXAMPLE.COM 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: doing downcall 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13756 from now), clnt: nfs@ws3.example.com, uid: -1, gid: -1, num aux grps: 0: 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: sending null reply 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: finished handling null request 
Nov  1 22:28:17 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: leaving poll 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: handling null request 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example.com@EXAMPLE.COM 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: doing downcall 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14280 from now), clnt: nfs@ws2.example.com, uid: -1, gid: -1, num aux grps: 0: 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: sending null reply 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: finished handling null request 
Nov  1 22:28:22 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: leaving poll 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: handling null request 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: sname = roberto@EXAMPLE.COM 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: doing downcall 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509623547 (33835 from now), clnt: <null>, uid: 3167, gid: 100, num aux grps: 17: 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   1) 100 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   2) 4 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   3) 24 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   4) 29 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   5) 33 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   6) 44 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   7) 46 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   8) 50 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (   9) 60 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (  10) 113 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (  11) 115 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (  12) 125 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (  13) 132 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (  14) 500 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (  15) 501 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (  16) 502 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]:   (  17) 503 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: sending null reply 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: finished handling null request 
Nov  1 22:28:32 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: leaving poll 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: handling null request 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example.com@EXAMPLE.COM 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: doing downcall 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14260 from now), clnt: nfs@ws2.example.com, uid: -1, gid: -1, num aux grps: 0: 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: sending null reply 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: finished handling null request 
Nov  1 22:28:42 nfssrv rpc.svcgssd[30152]: entering poll 
Nov  1 22:32:42 nfssrv rpc.svcgssd[30152]: exiting on signal 15 


- -- Package-specific info:
- -- rpcinfo --
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    391002    2   tcp    650  sgi_fam
    100005    1   udp  32767  mountd
    100005    1   tcp  32767  mountd
    100005    2   udp  32767  mountd
    100005    2   tcp  32767  mountd
    100005    3   udp  32767  mountd
    100005    3   tcp  32767  mountd
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    3   tcp   2049
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100227    3   udp   2049
    100021    1   udp  52460  nlockmgr
    100021    3   udp  52460  nlockmgr
    100021    4   udp  52460  nlockmgr
    100021    1   tcp  37351  nlockmgr
    100021    3   tcp  37351  nlockmgr
    100021    4   tcp  37351  nlockmgr
- -- /etc/default/nfs-kernel-server --
RPCNFSDCOUNT=8
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS="-p 32767"
NEED_SVCGSSD="yes"
RPCSVCGSSDOPTS=""
- -- /etc/exports --
/srv/nfs	192.168.0.0/16(sync,rw,fsid=0,crossmnt,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/home	192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/backup	192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/scratch	192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)

- -- System Information:
Debian Release: 9.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.9.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages nfs-kernel-server depends on:
ii  init-system-helpers  1.48
ii  keyutils             1.5.9-9
ii  libblkid1            2.29.2-1
ii  libc6                2.24-11+deb9u1
ii  libcap2              1:2.25-1
ii  libsqlite3-0         3.16.2-5
ii  libtirpc1            0.2.5-1.2
ii  libwrap0             7.6.q-26
ii  lsb-base             9.20161125
ii  netbase              5.4
ii  nfs-common           1:1.3.4-2.1
ii  ucf                  3.0036

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

- -- no debconf information

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEz9ERzDttUsU/BH8iLNd4Xt2nsg8FAln6mi4ACgkQLNd4Xt2n
sg+OPRAAksDf+IR6dTn1Gl8OM4IOMVi5S48i+S6n3sDy9Uv04zRX1sdvTuzvGF1d
4mRqPson+eQ3CnuMQIH8XOMZJdrZoB7bsjMWjZC9FgBn42oE7wOxcMJW+FC1qCYe
AJMan4ssGDec62XnBkkZY2byhvbXygEmSq/cCQlCKp3amwe3VEm+lqf1qFKl2euV
qrnkgk7aAgrH6zu+q5WgAyrEZzvDi7NbcGheUe0NitAKEu2zR+Iq6mMOEP9rd91s
sgJ2tCHvwZtE6FoUHk9GJrluZyzItBIGV+aaA9PFjJcoQgxB5mE5cS0vFlfXiVbm
xS9v67UJXp/H7R5S+mnoMDqfKi+TCW+FaLEj5s+4bpYd7sk6Pyqugj3yEj6OYKAw
u2G7iB4MA64roxGeIEghgPvW/Zmuvpa+pi6RjOUcyvO4lyNGMVUM7TP59fWuXbwv
kwQ+coBI6SxAUnKOiA9MADHZcUxjdhPWckkTFagijv9mipAY9+G6+fWBY3iFZrMO
vrGABHecEzV5TzKv4J+sSGl5581/+vImK3SIUA6ILBED1SkWmFFonekE9P0YDgFr
tR2+vcCvhQehYSQW/hifJpbY/3puUXYOvqQmv+Be3AVw6hBaxVfzY1yjkDemjKXr
oHmDZ5L76TokKoiv45JYIWAr8sjNsxScDUlhnP+Ou7Tvi8eUIqM=
=CbNN
-----END PGP SIGNATURE-----


Reply to: