[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ls aborts due to free()ing an invalid pointer



On Thu, May 03, 2007 at 11:55:49AM -0700, Steven Schlansker wrote:
> Karl E. Jorgensen wrote:
> >On Wed, May 02, 2007 at 11:23:21AM -0700, Steven Schlansker wrote:
> >>I'm having a rather strange error while trying to ls a large directory.  
> >>The setup is as follows:
> >>
> >>/home is nfs-mounted from a BSD box
> >>nsswitch is set to use LDAP for passwd, shadow, and group info
> >>nscd is running to cache the responses from LDAP
> >>
> >>I try to run ls -l /home, and get the error
> >>
> >>steven@soda:~$ ls -l /home
> >>*** glibc detected *** free(): invalid pointer: 0xa7f9ad38 ***
> >>Aborted
> >
> >Questions that might help narrow it down:
> >- Does other commands (find, shell wildcard expansion) behave strangely
> >  too?
> >- Do you get the same error if you omit "-l" ?
> >- What about "ls --numeric-uid-gid /home" ? (might blame/eliminate ldap)
> >- Does the same happen if you run the commands on the actual (BSD?) box?
> >  This would eliminate/blame NFS...
> >- Any out-of-the-ordinary options in /etc/fstab for /home ?
> >
> >It would be nice to narrow it down to one of:
> >- nfs
> >- ldap
> >- specific users/groups
> >- specific files
> >- network trouble (unlikely...)

[snip]

> I did some more narrowing down.  The problem was almost certainly with 
> LDAP.  Our LDAP server was heavily overloaded (19!  Never seen a 
> 15-minute load average that high before...) because we had an index on 
> the wrong key (uid instead of uidNumber, and all the queries used 
> uidNumber as their search term)

19 is workable. But it starts to hurt around there. I once had one of my
boxes up to 78 (didn't want to reboot as this would loose both uptime
counter and a diagnostic opportunity).

> So what was apparently happening was name lookups were taking too long 
> (a few seconds?).  Adding a proper index to slapd made the problem go 
> away.  It's probably a bug though that ls and friends would abort if it 
> couldn't resolve the name in a certain amount of time though - is that 
> intended behavior?  

I suspect that this is *not* the intended behaviour of ls :-) Sounds
like there's an obscure bug somewhere there...

> Wouldn't it be better to log a timeout and use the numeric ID or
> something?

I concur. But setting up a testcase for it might require a bit of
work - might not be worth it for such an obscure bug...

-- 
Karl E. Jorgensen
karl@jorgensen.org.uk  http://www.jorgensen.org.uk/
karl@jorgensen.com     http://karl.jorgensen.com
==== Today's fortune:
Before destruction a man's heart is haughty, but humility goes before honour.
		-- Proverbs 18:12

Attachment: signature.asc
Description: Digital signature


Reply to: