[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [xine-user] [ANN] PowerPC Assembly Patch



>	We can also look into getting an extra version of memcpy that
>	makes the transfers with floating point registers as some
>	people suggested on the Debian PowerPC mailing list.
>
>	People there said that using floating point registers (which
>	are 64 bits large) instead of general purpose registers (32
>	bits each) may improve things.

It will, but you have to be properly aligned. Another thing you could
do is make an altivec memcpy. I saw there is one in the latest VideoLAN
CVS you can grab, though I don't know if it handle misaligned transfers
(the Altivec can deal with those more easily than the FPU, though it's
always better to have things aligned, in the altivec case the alignement
boundary is 4 words (128 bits)).

>> I'd like to know how much it helps PPC users, so keep this list up
>> to date with the results. (Also, if my patch breaks other
>> platforms...) It gave my Mac laptop the little boost it needed to
>> play some media I have.
>
>	Well, with the faster memcpy and with XFree86 4.2.0 (with DMA
>	enabled), I can watch a DVD here with linearblend
>	deinterlacing (coded in C) enabled and there are about 15% of
>	frames skipped, which while still not perfect, is quite an
>	improvement in face of the situation some weeks ago.

With an encrypted DVD, I also noticed the kernel abuse PIO transfers
instead of DMA. I think there's a patch floating around to improve that.
Also make sure you are using unmask irq on your DVD (hdparm -u1 /dev/hdc)

>	BTW, I am using gcc-3.0 to compile xine-libs and I added some
>	extra options to the configure script (-mfused-madd,
>	-mcpu=750, -mtune=750, -O9).
>
>	The next points of improvement (which may not be as immediate
>	as using the memcpy being discussion) may be coding the idct,
>	motion compensation and deinterlacing in assembly also.

There are already altivec implementations, but no ordinary PPC asm ones.
One other big killer on PPC is byte access. Look at
the bitstream decoding, if you manage to do only 32 bits aligned loads
from memory, then do the splitting in registers, you may actually improve
perfs slightly.

>	I guess that I'll heave to learn a bit more before I can get
>	to these, but with the help of other people, things could go
>	faster.
>
>> Just so you know, the methods I used are from the linux kernel
>> version 2.4.18 (arch/ppc/lib/string.S)
>
>	Yes, that's what I tried in my earlier message, but I wasn't
>	as succesful as you were.
>
>	Your patch had a problem, though and I had to apply a part of
>	it by hand. You might perhaps want to remake it and send to
>	the xine developers so that it can be included for xine
>	release 0.9.10, which should be near.
>
>> Andrew Patrikalakis
>
>
>	Thanks for your help, Roger...
>
>--
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>  Rogério Brito - rbrito@iname.com - http://www.ime.usp.br/~rbrito/
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
>
>--
>To UNSUBSCRIBE, email to debian-powerpc-request@lists.debian.org
>with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>




--
To UNSUBSCRIBE, email to debian-powerpc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: