[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Using PPC asm (from Linux kernel) in xine



Michel Lanners writes:
> On  22 May, this message from Albert D. Cahalan echoed through cyberspace:
>> =?iso-8859-1?Q?Rog writes:

> [abot fast memcopy in asm]
>
>>   dcbt  eight,src        /* prefetch the next cache line */
>> loop_top:
>>   dcba  eight,dst        /* allocate a cache line */
>>   lfd   f11,8,(src)
>>   lfd   f12,16,(src)
>>   lfd   f13,24,(src)
>>   lfdu  f14,32,(src)
>               ^^
>    These should probably be 0,8,16 and 24... same goes below.

Nope. I thought so too, but then lfdu would increment
the pointer by only 24. Instead of that, the pointer
stays back by 8 bytes to compensate. The setup code
would subtract 8 before entering the loop. This then
means I need "eight".

I think the dcbt might not be useful, since there
won't ever be a spare memory cycle for prefetch.

>>   dcbi  r0,src           /* would like to discard the src data */
>>   dcbt  eight,src        /* prefetch the next cache line */
>>   stfd  f11,8,(dst)
>>   stfd  f12,16,(dst)
>>   stfd  f13,24,(dst)
>>   stfdu f14,32,(dst)
>>   dcbf  r0,dst           /* write back if needed, then invalidate */
>
> And don't forget the loop counter and some increment...
> Or code it with a decrement operator and copy backwards?

I didn't forget; look again. This is PowerPC. :-)
Here's the assembly with C code:

dcbt  eight,src     // prefetch the cache line with src[1]...src[4]
loop_top:           do{
dcba  eight,dst     // allocate a cache line for dst[1]...dst[4]
lfd   f11,8,(src)   double_1 = src[1];
lfd   f12,16,(src)  double_2 = src[2];
lfd   f13,24,(src)  double_3 = src[3];
lfdu  f14,32,(src)  double_4 = src[4]; src += 4;
dcbi  r0,src        // would like to discard the src[-3]...src[0]
dcbt  eight,src     // prefetch the cache line with src[1]...src[4]
stfd  f11,8,(dst)   src[1] = double_1;
stfd  f12,16,(dst)  src[2] = double_2;
stfd  f13,24,(dst)  src[3] = double_3;
stfdu f14,32,(dst)  dst[4] = double_4; dst += 4;
dcbf  r0,dst        // write back dst[-3]...dst[0] if needed, then invalidate it
bdnz  loop_top      }while(--ctr);

The copy should go the opposite direction to the direction
used to fill src[] with data. This way you make better usage
of the cache. The cache control instructions should prevent
this from being very important though.

>> bdnz  loop_top
>> 
>> BLAH, BLAH...


-- 
To UNSUBSCRIBE, email to debian-powerpc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: