Re: [xine-user] [ANN] PowerPC Assembly Patch
On May 25 2002, Andrew Patrikalakis wrote:
> Hello all,
Hi, Andrew.
> With all the recent talk of use of assembly on the PowerPC, I came
> up with a patch to use assembly versions of memcpy. It's about 35%
> faster. Here is a sample of the memcpy speed test (which also now
> works):
Thanks for helping get xine get better for PowerPC.
And also thanks for looking into my earlier message and
discovering which part of the code was giving the relocation
error (I just read your patch).
> Benchmarking memcpy methods (smaller is better):
> glibc memcpy() : 136
> ppcasm_memcpy() : 137
> ppcasm_cacheable_memcpy() : 88
> xine: using ppcasm_cacheable_memcpy()
> (The lower time resolution is because I'm using times(NULL) in rdtsc())
We can also look into getting an extra version of memcpy that
makes the transfers with floating point registers as some
people suggested on the Debian PowerPC mailing list.
People there said that using floating point registers (which
are 64 bits large) instead of general purpose registers (32
bits each) may improve things.
> I'd like to know how much it helps PPC users, so keep this list up
> to date with the results. (Also, if my patch breaks other
> platforms...) It gave my Mac laptop the little boost it needed to
> play some media I have.
Well, with the faster memcpy and with XFree86 4.2.0 (with DMA
enabled), I can watch a DVD here with linearblend
deinterlacing (coded in C) enabled and there are about 15% of
frames skipped, which while still not perfect, is quite an
improvement in face of the situation some weeks ago.
BTW, I am using gcc-3.0 to compile xine-libs and I added some
extra options to the configure script (-mfused-madd,
-mcpu=750, -mtune=750, -O9).
The next points of improvement (which may not be as immediate
as using the memcpy being discussion) may be coding the idct,
motion compensation and deinterlacing in assembly also.
I guess that I'll heave to learn a bit more before I can get
to these, but with the help of other people, things could go
faster.
> Just so you know, the methods I used are from the linux kernel
> version 2.4.18 (arch/ppc/lib/string.S)
Yes, that's what I tried in my earlier message, but I wasn't
as succesful as you were.
Your patch had a problem, though and I had to apply a part of
it by hand. You might perhaps want to remake it and send to
the xine developers so that it can be included for xine
release 0.9.10, which should be near.
> Andrew Patrikalakis
Thanks for your help, Roger...
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rogério Brito - rbrito@iname.com - http://www.ime.usp.br/~rbrito/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
--
To UNSUBSCRIBE, email to debian-powerpc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: