Re: Xvideo acceleration: GATOS for PPC?
Michel Lanners writes:
> I wouldn't know for top, but I can say that mtrr definitely makes a
> difference: 45% cpu for X without mtrr, down to roughly 5% with the
> proper mtrr configured. So that was it.
I mostly fixed top. Debian-unstable has the fix. There is a bit of
randomness due to data collection that isn't instant, and the very
first screen is bogus due to kernel limitations.
> We might be able to squeeze a few percent out of better caching for the
> framebuffer (making X's framebuffer mapping cacheable enables bursting
> from the CPU; combined with float or vector stores instead of regular
> memcpy that should give a boost that _could_ come close to what mtrr
> achieves on i386.
Oh, you really want to play with the WIMG bits! It would be
nice if there were arch-specific mmap() flags for this.
If you can spare a BAT register, use it. (maybe use page tables
for most of the kernel address space, with just one BAT to
cover the kernel itself)
The WIMG setting should be 0000. This is:
write-back (not write-through)
coherency not enforced (must use cache control instructions!!!)
not guarded against ordering/merging/speculative troubles
Then you do:
"dcba" for a frame buffer cache line
fill the cache line with your data
"dcbf" to write out and then free the cache line
That "fill the cache line with your data" part should also have
some sort of cache control stuff. Most likely it should use the
AltiVec prefetch stuff for streaming data too.
As always, unroll the loop a bit so that you can move instructions
around. You need to do this to avoid stalls due to instructions
needing to wait for preceeding instructions to complete. Put as much
distance between such instructions as you can.
Here is your %CPU goal:
You should be able to get pretty close to that, since memory
operations may be interleaved with other operations that will
then become "free", just as the TCP/IP checksum comes "free"
with a copy.