Re: depending on a customized library
On 5/18/05, Hubert Chan <firstname.lastname@example.org> wrote:
> >>>>> "Michael" == Michael K Edwards <email@example.com> writes:
> Commenting out those lines, and compiling multi-threaded, gives
> performance similar to the single-threaded case. So what does this
> mean? I doubt that Ryan will want to disable THREAD_LOCAL_ALLOC
It means someone ought to beat on the spin-then-queue locking
implementation enabled by THREAD_LOCAL_ALLOC until it isn't retrograde
for the common single-threaded case. That's really a job for
oprofile, which I'm starting to get spun up on now; but code
inspection, informed by some knowledge about NPTL, might be enough.
By the way, if you want to use oprofile, you might as well use the
0.8.2 release. apt-get source oprofile will get you 0.8.1; grab the
0.8.2 upstream, unpack it, grab the 0.8.2 release notes, put them in
./ReleaseNotes, copy over ./AUTHORS and ./debian from the 0.8.1 tree,
add a debian/changelog entry, run ./autogen.sh (use autoconf 2.59 and
automake 1.7.9), propagate over the doc fixes if you want,
dpkg-buildpackage -rfakeroot, you're good to go. The oprofile module
is part of stock 2.6.x kernels; you have to rebuild with
install_vmlinux in /etc/kernel-pkg.conf if you want kernel profiling,
but for userspace stuff the stock kernel should be OK.
> I also tried compiling with THREAD_LOCAL_ALLOC, but using
> GC_local_malloc instead of GC_malloc, but performance is similar to just
> using GC_malloc.
>From http://www.hpl.hp.com/personal/Hans_Boehm/gc/scale.html :
The easiest way to switch an application to thread-local allocation is to
1. Define the macro GC_REDIRECT_TO_LOCAL, and then include the gc.h
header in each client source file.
2. Invoke GC_thr_init() before any allocation.
3. Allocate using GC_MALLOC, GC_MALLOC_ATOMIC, and/or GC_GCJ_MALLOC.
Oddly, -DPARALLEL_MARK may improve the situation for UP thread-local
allocation, because it results in the use of an implementation of
GC_malloc_many (used to refill thread-local free lists) that may be
better tuned for thread-local usage patterns (as well as more
Care to give that a shot?