[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ls aborts due to free()ing an invalid pointer





Karl E. Jorgensen wrote:
On Wed, May 02, 2007 at 11:23:21AM -0700, Steven Schlansker wrote:
Hello everyone,
I'm having a rather strange error while trying to ls a large directory. The setup is as follows:

/home is nfs-mounted from a BSD box
nsswitch is set to use LDAP for passwd, shadow, and group info
nscd is running to cache the responses from LDAP

I try to run ls -l /home, and get the error

steven@soda:~$ ls -l /home
*** glibc detected *** free(): invalid pointer: 0xa7f9ad38 ***
Aborted

Strange..
Questions that might help narrow it down:
- Does other commands (find, shell wildcard expansion) behave strangely
  too?
- Do you get the same error if you omit "-l" ?
- What about "ls --numeric-uid-gid /home" ? (might blame/eliminate ldap)
- Does the same happen if you run the commands on the actual (BSD?) box?
  This would eliminate/blame NFS...
- Any out-of-the-ordinary options in /etc/fstab for /home ?

It would be nice to narrow it down to one of:
- nfs
- ldap
- specific users/groups
- specific files
- network trouble (unlikely...)


Looks like it's error reporting from here on...

Does the same happen if you "ls -l" any of joew's files? The strace
output might reveal this, but it may have been a few hundred lines
before the interesting bit...

And finally the trace
[snip]

Sorry about the rather verbose debugging information, I don't really know where to proceed from here. Any help would be much appreciated!

verbose is good - especially when it's not random ramblings :-)

Hope this helps

I did some more narrowing down. The problem was almost certainly with LDAP. Our LDAP server was heavily overloaded (19! Never seen a 15-minute load average that high before...) because we had an index on the wrong key (uid instead of uidNumber, and all the queries used uidNumber as their search term)

So what was apparently happening was name lookups were taking too long (a few seconds?). Adding a proper index to slapd made the problem go away. It's probably a bug though that ls and friends would abort if it couldn't resolve the name in a certain amount of time though - is that intended behavior? Wouldn't it be better to log a timeout and use the numeric ID or something?



Reply to: