[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#831033: libc6: NSS (compat/nis) randomly fails for getent*



Package: libc6
Severity: important



-- System Information:
Debian Release: 8.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/64 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)


We are having random failures to map UIDs to usernames

** Note: joebob and bigcomputer are fabricated names

	joebob@bigcomputer:~$ whoami
	joebob
	joebob@bigcomputer:~$ whoami
	whoami: cannot find name for user ID 1234
	joebob@bigcomputer:~$ whoami
	whoami: cannot find name for user ID 1234
	joebob@bigcomputer:~$ whoami
	joebob
	joebob@bigcomputer:~$ whoami
	joebob

but
	ypcat passwd.byname | grep joebob

consistently works properly.

        ypcat passwd.byname | wc -l

always returns the same value.   So, it appears that NIS is correctly functioning itself.
(it's a number i expect above 1,000, and definitely not zero ;) )


However,

I have no name!@bigcomputer:~$ while getent passwd | wc -l; do sleep 1; done
2462
2462
2462
2462
2462
1377
85
36
36
36
36
36
36
50

So, getent passwd is randomly failing to fully populate


/etc/nsswitch.conf contains:

	passwd:         compat
	group:          compat nis  *
	netgroup:       nis
* this has been in our systems for years due to bug 584914

I have alternatively tried the following unsuccessfully:

passwd:  compat nis  (to see if 584914 is related)
passwd:  files nis
passwd:  compat [NOTFOUND=continue] compat [NOTFOUND=continue] compat [NOTFOUND=continue] compat

The latter is because libnss_nis appears to return notfound, not unavailable, so i was hoping to do multiple retries, but i'm not sure what i hoped to perform here is even doing what i wanted.


Also:

   getent -s nis passwd joebob
   getent -s compat passwd joebob

both exhibit the random failure/success (so, it's not just libnss_compat here)

getent' is returning status code "2" (One or more supplied key could not be found in the database.)


So, it seems to me that the common component here is libnss_nis.so.

The machine this is running on is a rather beefy server:

Dell PowerEdge R820
256GB RAM
4 x Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz


This problem presents itself when the system is getting heavily loaded, so this seems like a race-condition somewhere.

I may not be able to do much testing as the system is being emergently reconfigured to remove NIS dependency, but let me know if you need any further information/testing.

btw, 'nscd' is NOT running, and with bug reports related to these libnss_nis/compat issues i see lots of folks saying 'nscd' made no difference, so i didn't test it.


thanks,
--stephen


Reply to: