[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Slow Xorg performance on dual Opteron + Radeon, Jessie 64-bit



Thanks for the ongoing suggestions.  Some more testing and results:

`grep render /var/log/Xorg.0.log`:
[  8898.907] (II) RADEON(0): Direct rendering enabled

`glxinfo | grep render` gives:
direct rendering: Yes
OpenGL renderer string: Gallium 0.4 on ATI R420

I've tested with an xorg.conf and without. Without, it comes up in 1024x768 resolution and the problem seems less severe, but it's still noticeable, and Xorg still spends an awful lot of time in __memcpy_sse2_unaligned.

While doing some further reading and searching, I discovered that Debian only recently switched back to using glibc, after using eglibc since about 2009. Since my first suspicions were around libc and possible performance regressions there, I thought I'd take a closer look, using the following test program:

<<
#include <string.h>
#include <stdlib.h>

const int SIZE=1048576 * 1024;
const int REPS=20;

int i;

void main() {
        void *chunk_a = malloc(SIZE);
        void *chunk_b = malloc(SIZE);

        for (i = 0; i < REPS; i++)
                memcpy(chunk_b, chunk_a, SIZE);

        free(chunk_a);
        free(chunk_b);
}
>>

I found a .deb of glibc-2.17 for comparison (__memcpy_sse2_unaligned was added between glibc-2.17 and glibc-2.18) and used LD_PRELOAD to load that version instead of Jessie's native glibc-2.19.

time LD_PRELOAD="/tmp/lib/x86_64-linux-gnu/ld-2.17.so /tmp/lib/x86_64-linux-gnu/libc-2.17.so" ./memcpy-benchmark

real    0m12.303s
user    0m11.456s
sys     0m0.844s


And using the system libc:

time ./memcpy-benchmark

real    0m22.509s
user    0m21.504s
sys     0m0.992s


So Jessie's standard libc-2.19 is about half as fast as 2.17 on the same hardware and otherwise same OS. I don't know if this accounts entirely for the poor performance I'm seeing, but it probably doesn't help. Might this be worth taking up with the GNU libc people? My guess is it might be due to slow SSE2 execution on this old AMD64 CPU, whereas the glibc code is probably optimised for much newer hardware.

Thanks again,
Chris


Reply to: