Re: Slow Xorg performance on dual Opteron + Radeon, Jessie 64-bit
Thanks for the ongoing suggestions. Some more testing and results:
`grep render /var/log/Xorg.0.log`:
[ 8898.907] (II) RADEON(0): Direct rendering enabled
`glxinfo | grep render` gives:
direct rendering: Yes
OpenGL renderer string: Gallium 0.4 on ATI R420
I've tested with an xorg.conf and without. Without, it comes up in
1024x768 resolution and the problem seems less severe, but it's still
noticeable, and Xorg still spends an awful lot of time in
__memcpy_sse2_unaligned.
While doing some further reading and searching, I discovered that Debian
only recently switched back to using glibc, after using eglibc since
about 2009. Since my first suspicions were around libc and possible
performance regressions there, I thought I'd take a closer look, using
the following test program:
<<
#include <string.h>
#include <stdlib.h>
const int SIZE=1048576 * 1024;
const int REPS=20;
int i;
void main() {
void *chunk_a = malloc(SIZE);
void *chunk_b = malloc(SIZE);
for (i = 0; i < REPS; i++)
memcpy(chunk_b, chunk_a, SIZE);
free(chunk_a);
free(chunk_b);
}
>>
I found a .deb of glibc-2.17 for comparison (__memcpy_sse2_unaligned was
added between glibc-2.17 and glibc-2.18) and used LD_PRELOAD to load
that version instead of Jessie's native glibc-2.19.
time LD_PRELOAD="/tmp/lib/x86_64-linux-gnu/ld-2.17.so
/tmp/lib/x86_64-linux-gnu/libc-2.17.so" ./memcpy-benchmark
real 0m12.303s
user 0m11.456s
sys 0m0.844s
And using the system libc:
time ./memcpy-benchmark
real 0m22.509s
user 0m21.504s
sys 0m0.992s
So Jessie's standard libc-2.19 is about half as fast as 2.17 on the same
hardware and otherwise same OS. I don't know if this accounts entirely
for the poor performance I'm seeing, but it probably doesn't help.
Might this be worth taking up with the GNU libc people? My guess is it
might be due to slow SSE2 execution on this old AMD64 CPU, whereas the
glibc code is probably optimised for much newer hardware.
Thanks again,
Chris
Reply to: