[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ls aborts due to free()ing an invalid pointer



On Wed, May 02, 2007 at 11:23:21AM -0700, Steven Schlansker wrote:
> Hello everyone,
> I'm having a rather strange error while trying to ls a large directory.  
> The setup is as follows:
> 
> /home is nfs-mounted from a BSD box
> nsswitch is set to use LDAP for passwd, shadow, and group info
> nscd is running to cache the responses from LDAP
> 
> I try to run ls -l /home, and get the error
> 
> steven@soda:~$ ls -l /home
> *** glibc detected *** free(): invalid pointer: 0xa7f9ad38 ***
> Aborted

Strange.. 

Questions that might help narrow it down:
- Does other commands (find, shell wildcard expansion) behave strangely
  too?
- Do you get the same error if you omit "-l" ?
- What about "ls --numeric-uid-gid /home" ? (might blame/eliminate ldap)
- Does the same happen if you run the commands on the actual (BSD?) box?
  This would eliminate/blame NFS...
- Any out-of-the-ordinary options in /etc/fstab for /home ?

It would be nice to narrow it down to one of:
- nfs
- ldap
- specific users/groups
- specific files
- network trouble (unlikely...)

...

> Strace reveals:
> 
> ...lots and lots and lots of lookups...
> connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
> poll([{fd=4, events=POLLOUT|POLLERR|POLLHUP, revents=POLLOUT}], 1, 5000) = 1
> writev(4, [{"\2\0\0\0\1\0\0\0\5\0\0\0", 12}, {"9954\0", 5}], 2) = 17
> poll([{fd=4, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN|POLLHUP}], 1, 
> 5000) = 1
> read(4, "\2\0\0\0\1\0\0\0\10\0\0\0\2\0\0\0\342&\0\0\350\3\0\0\17"..., 
> 36) = 36
> read(4, "joeshaw\0*\0Joseph C. Shaw\0/home/a"..., 70) = 70
> close(4)                                = 0
> lstat64("joew", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
> getxattr("joew", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP 
> (Operation not supported)
> socket(PF_FILE, SOCK_STREAM, 0)         = 4
> fcntl64(4, F_GETFL)                     = 0x2 (flags O_RDWR)
> fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
> connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
> poll([{fd=4, events=POLLOUT|POLLERR|POLLHUP, revents=POLLOUT}], 1, 5000) = 1
> writev(4, [{"\2\0\0\0\1\0\0\0\6\0\0\0", 12}, {"10182\0", 6}], 2) = 18
> poll([{fd=4, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 5000) = 1
> read(4, "\2\0\0\0\1\0\0\0\5\0\0\0\2\0\0\0\306\'\0\0\350\3\0\0\17"..., 
> 36) = 36
> read(4, "joew\0*\0Joe Wahrhaftig\0/home/apol"..., 64) = 64
> close(4)                                = 0

Looks like it's error reporting from here on...

Does the same happen if you "ls -l" any of joew's files? The strace
output might reveal this, but it may have been a few hundred lines
before the interesting bit...

> And finally the trace
[snip]

> Sorry about the rather verbose debugging information, I don't really 
> know where to proceed from here.  Any help would be much appreciated!  

verbose is good - especially when it's not random ramblings :-)

Hope this helps
-- 
Karl E. Jorgensen
karl@jorgensen.org.uk  http://www.jorgensen.org.uk/
karl@jorgensen.com     http://karl.jorgensen.com
==== Today's fortune:
One small step for man, one giant stumble for mankind.

Attachment: signature.asc
Description: Digital signature


Reply to: