[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: depending on a customized library

On 5/18/05, Hubert Chan <hubert@uhoreg.ca> wrote:
> >>>>> "Michael" == Michael K Edwards <m.k.edwards@gmail.com> writes:
> Commenting out those lines, and compiling multi-threaded, gives
> performance similar to the single-threaded case.  So what does this
> mean?  I doubt that Ryan will want to disable THREAD_LOCAL_ALLOC
> Debian-wide.

It means someone ought to beat on the spin-then-queue locking
implementation enabled by THREAD_LOCAL_ALLOC until it isn't retrograde
for the common single-threaded case.  That's really a job for
oprofile, which I'm starting to get spun up on now; but code
inspection, informed by some knowledge about NPTL, might be enough.

By the way, if you want to use oprofile, you might as well use the
0.8.2 release.  apt-get source oprofile will get you 0.8.1; grab the
0.8.2 upstream, unpack it, grab the 0.8.2 release notes, put them in
./ReleaseNotes, copy over ./AUTHORS and ./debian from the 0.8.1 tree,
add a debian/changelog entry, run ./autogen.sh (use autoconf 2.59 and
automake 1.7.9), propagate over the doc fixes if you want,
dpkg-buildpackage -rfakeroot, you're good to go.  The oprofile module
is part of stock 2.6.x kernels; you have to rebuild with
install_vmlinux in /etc/kernel-pkg.conf if you want kernel profiling,
but for userspace stuff the stock kernel should be OK.

> I also tried compiling with THREAD_LOCAL_ALLOC, but using
> GC_local_malloc instead of GC_malloc, but performance is similar to just
> using GC_malloc.

>From http://www.hpl.hp.com/personal/Hans_Boehm/gc/scale.html :

The easiest way to switch an application to thread-local allocation is to

   1. Define the macro GC_REDIRECT_TO_LOCAL, and then include the gc.h
header in each client source file.
   2. Invoke GC_thr_init() before any allocation.
   3. Allocate using GC_MALLOC, GC_MALLOC_ATOMIC, and/or GC_GCJ_MALLOC. 

Oddly, -DPARALLEL_MARK may improve the situation for UP thread-local
allocation, because it results in the use of an implementation of
GC_malloc_many (used to refill thread-local free lists) that may be
better tuned for thread-local usage patterns (as well as more

Care to give that a shot?

- Michael

Reply to: