[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: depending on a customized library



On 5/17/05, Hubert Chan <hubert@uhoreg.ca> wrote:
> >>>>> "Michael" == Michael K Edwards <m.k.edwards@gmail.com> writes:
> 
> Michael> I'd be surprised if it's that bad under NPTL, and if it is, I'd
> Michael> be surprised if it can't be substantially improved (at least on
> Michael> x86) with a little bit of oprofile work.  When was that
> Michael> performance comparison done?
> 
> The libgc dependency was only added recently, so upstream's performance
> comparison was done within the last couple of weeks.  I just tried it
> out myself, and my own informal tests seem to agree with upstream's
> numbers.  (I'm running sid, last updated a couple of weeks ago, on a
> 2.6.10 kernel.)

This is going to sound stupid, but have you tried it with either glibc
2.3.4 or Ubuntu's modified glibc?  There are some threading-related
issues that I know were addressed very late in the hoary cycle -- as
far as I know, primarily pthread_cancel semantics, but they may have
performance implications too -- and the fix may not be in sid.  If
it's inconvenient to compare, don't worry about it; glibc 2.3.4 will
hit sid not long after sarge releases, right?

Just to check:  you are using the same compilation and linking scheme
for both builds you are benchmarking, right?  -fPIC and dynamic
library thunks can add more than a little overhead to a slab-ish,
usually-available-from-free-list malloc().  Oh, and are there things
in the header files that change from macros / inline functions to real
function calls when you switch on threading?

Note also the tuning issues discussed in
http://www.hpl.hp.com/personal/Hans_Boehm/gc/scale.html , which
includes benchmarks done on a now ancient kernel (2.2.12).  If you are
using -DPARALLEL_MARK on an SMP (or hyperthreaded) machine, and the
scheduler gets the processor affinity wrong for the dedicated marking
threads, I could see that having unfortunate performance consequences.
 If you have compiled with -DTHREAD_LOCAL_ALLOC but are not using the
API in gc_local_alloc.h, you will be hurting.

In general, I think that the tool you need for this purpose is
oprofile.  You want to see where threads are sleeping (if at all), how
much time is spent fiddling with spinlocks, etc.  I'm a novice with
oprofile but I'll be needing to learn about it Real Soon Now.  Ryan,
have you had occasion to throw this or other profiling tools at libgc?

Cheers,
- Michael



Reply to: