[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: iBook and playing DVDs



=?iso-8859-1?Q?Rog writes:
> On May 11 2002, Michel Lanners wrote:
>> On  10 May, this message from Rog\351rio Brito echoed through cyberspace:

> [High CPU usage during video output]
>> That's essentially because of MTRR on i386. I wouldn't know hard
>> numbers to compare, but at least subjectivly, MTRR helps a lot for
>> the copy to VRAM of the video out data.
>
> Ouch, I miss MTRR. :-(

As far as XFree86 is concerned, MTRR is a Linux kernel
feature. The driver handles similar non-Intel features;
it could do something useful on PowerPC too.

Things are a little different on PowerPC, with the
attributes specified on a per-page basis. It's not
good to have two mappings with different attributes
for the same physical address.

Still, it should be doable. You'll want memory to
be write-back cached, not guarded, and not coherent.
Then you push stuff out by hand, using the cache
control instructions to do so. (being not coherent
is great when you don't need to worry about getting
swapped out or scheduled on another CPU)

>> - optimize a few of the more processor-intensive parts
>>   of the algorithms with handcoded ASM. Good luck...
>
> Well, I'm so pissed that I am currently even considering
> learning PPC's assembly for this task. I even downloaded
> Motorola's user guide for the G3. :-( The only problem
> now is lack of time.

PowerPC assembly is easy, at least w/o using AltiVec.
Just remember that Linux fails to set the LE and ILE
bits in the MSR, so a multi-byte value is stored
backwards in memory. Other than that, it's pretty sane.
The documentation is crap though; shortcut opcodes
like "li" are used all over the place in "real" code
but you can't find them in the index or opcode table.

Tricks to know:

Motorola will ship you free books if you can figure
out where to ask. Dig around on their web site.

There are 3 instructions (rlwimi & friends) that
let you rotate, shift, and mask. Learn to use them.

The FPU runs in parallel to the integer unit(s).
You'll want it in the non-recoverable mode (some
MSR bits control this) with all exceptions off.

An unsigned int up to 0x007fffff is a float, with
a factor of 2**150 to annoy you. (a 150-bit shift)
This won't work for AltiVec, or a 6xx with the NI
bit set in the FPSCR.

Look at "gcc -S" output. It's kind of hard to
read though, because the compiler uses raw numbers
("6") instead of register names ("r6" or "f6").

Interleave your code to avoid stalls.

The cache manipulation instructions are nice.
Even w/o the extra AltiVec ones, you can do a
lot with 32-byte chunks of memory.

I find that drawing on a physical piece of paper
helps with register allocation. Scatter instructions all
over the paper, using arrows to indicate dependencies.
Use "r[]" to name every register, where "[]" is just
a placeholder box you draw. Shift the instructions
around so that you avoid having an arrow between
directly adjacent instructions. As you work, label the
arrows like this: r4, memory, cr2, f11, r8. You are
done when all of the boxes have been filled in.

Linux doesn't trash any registers. Thread libraries
might trash some. There is a register reserved for
the OS, and another for a small data area... you
can abuse both of them.

Use the CTR register for call-by-pointer. You can
also use it for inner loops. Don't store useless
state in a leaf function.

Set the branch prediction bits on conditional branches.

Don't waste your time searching for a conditional
move instruction. ARM, x86, Alpha, and IA-64 all
have this, but not PowerPC. You'll have to jump if
you can't get by with some sort of computation.
If you take pride in straight-line code, PowerPC
is going to frustrate you.

If you need to divide, use mulhw with a mysterious
constant and then shift right. Search the web for
more info, or grab the constant from gcc output.

> BTW, I'm using the following options to compile the programs I
> am trying:
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> CC="gcc-3.0"
> CFLAGS="-O3 -fomit-frame-pointer -ffast-math -frename-registers \
> 	    -mtune=750 -mcpu=750 -mfused-madd"
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> 	Any suggestions on what else I should use? Perhaps gcc-3.1 is
> 	a bit better regarding optimizations?

Try -O2 or -Os instead of -O3.


-- 
To UNSUBSCRIBE, email to debian-powerpc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: