Bug#831033: libc6: NSS (compat/nis) randomly fails for getent*
Package: libc6
Severity: important
-- System Information:
Debian Release: 8.5
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 3.16.0-4-amd64 (SMP w/64 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
We are having random failures to map UIDs to usernames
** Note: joebob and bigcomputer are fabricated names
joebob@bigcomputer:~$ whoami
joebob
joebob@bigcomputer:~$ whoami
whoami: cannot find name for user ID 1234
joebob@bigcomputer:~$ whoami
whoami: cannot find name for user ID 1234
joebob@bigcomputer:~$ whoami
joebob
joebob@bigcomputer:~$ whoami
joebob
but
ypcat passwd.byname | grep joebob
consistently works properly.
ypcat passwd.byname | wc -l
always returns the same value. So, it appears that NIS is correctly functioning itself.
(it's a number i expect above 1,000, and definitely not zero ;) )
However,
I have no name!@bigcomputer:~$ while getent passwd | wc -l; do sleep 1; done
2462
2462
2462
2462
2462
1377
85
36
36
36
36
36
36
50
So, getent passwd is randomly failing to fully populate
/etc/nsswitch.conf contains:
passwd: compat
group: compat nis *
netgroup: nis
* this has been in our systems for years due to bug 584914
I have alternatively tried the following unsuccessfully:
passwd: compat nis (to see if 584914 is related)
passwd: files nis
passwd: compat [NOTFOUND=continue] compat [NOTFOUND=continue] compat [NOTFOUND=continue] compat
The latter is because libnss_nis appears to return notfound, not unavailable, so i was hoping to do multiple retries, but i'm not sure what i hoped to perform here is even doing what i wanted.
Also:
getent -s nis passwd joebob
getent -s compat passwd joebob
both exhibit the random failure/success (so, it's not just libnss_compat here)
getent' is returning status code "2" (One or more supplied key could not be found in the database.)
So, it seems to me that the common component here is libnss_nis.so.
The machine this is running on is a rather beefy server:
Dell PowerEdge R820
256GB RAM
4 x Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz
This problem presents itself when the system is getting heavily loaded, so this seems like a race-condition somewhere.
I may not be able to do much testing as the system is being emergently reconfigured to remove NIS dependency, but let me know if you need any further information/testing.
btw, 'nscd' is NOT running, and with bug reports related to these libnss_nis/compat issues i see lots of folks saying 'nscd' made no difference, so i didn't test it.
thanks,
--stephen
Reply to: