Bug#513635: nss-ldap timeout=0 by default caused nscd 100% cpu loop
We too have seen this behavior with Ubuntu Hardy and Intrepid
but may be seeing multiple problems.
There is an issue with the /etc/ldap.conf file. This has the comment:
# Search timelimit
#timelimit 30
But the default in the nss-ldap code is NO_LIMIT! We are now testing by
uncommenting this line.
We have seen this on more then one machine. One nscd worker thread
will call nss-ldap and it will then call ldap_result with a 0 timelimit.
ldap_result calls ldap_int_select that call poll. netstat -n | grep tcp
shows connection with CLOSE_WAIT. This thread holds a nss_ldap lock, so
with all the other threads are waiting for it at _nss_ldap_enter, thus
no worker threads are doing any work.
As new request are received by nscd, it will do the accept and queue them
for a worker thread. Each request uses an fd.
As each caller to nscd does not get a response, it times out seconds) and
appears to do its own ldap query so things sort of work but slowly. The used
fd count (/proc/<nscd-pid>/fd) continues to rise. Eventually nscd runs out
of fds, and goes in to the 100% cpu loop trying to do accepts.
So there appears to be three separate problems:
(1) timelimit = 0 is default in nss-ldap but /etc/ldap.conf implies it is 30
(2) ldap wait4msg does not recognize the connection is in CLOSED_WAIT
    even with timeout = LDAP_NO_LIMIT
(3) nscd will accept requests (i.e. each using a FD)
    and queue them for the worker threads until it runs out of FD's,
    rather then not accepting new requests. It makes no allowance for
    the fact the worker threads may also need to open files or sockets
    too.
--
 Douglas E. Engert  <DEEngert@anl.gov>
 Argonne National Laboratory
 9700 South Cass Avenue
 Argonne, Illinois  60439
 (630) 252-5444
Reply to: