[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: iBook and playing DVDs



benh@kernel.crashi writes:
> [Albert Cahalan]

>> It is not common to have the video card writing
>> to AGP memory.
>
> By default, the r128 and radeon DRI drivers to write to
> AGP memory the ring readptr, but doing so seem to be
> broken on some HW (UniNorth 1.0.x and some ia64 bridges
> don't deal with that properly)

This looks like just one thing that you could put
on a cache line (or page) by itself.

To view it from the CPU:

1. junk the cache line containing this value
2. load the value in the normal way

Anything else?

>> It is not common for the for the user to read AGP memory.
>
> You don't know. If it's cacheable, writing a byte will cause
> a CPU load of the entire cacheline for example.

Arrrgh... I forget that users don't always write
full cache lines. Still, no problem unless the
cache line is shared with something incompatible.

>> User apps need caching off by default, since trying to
>> update all the apps would be insane.
>>
>> Unless user code will write to AGP memory on one
>> processor and read or write on another processor,
>> the M bit (Memory Coherency Attribute) can be
>> cleared. It's pointless for the CPU to waste bus
>> cycles trying to be coherent, since the video card
>> will not cooperate. All non-SMP systems should
>> map the AGP memory with coherency disabled.
>
> I'm not too sure about that. What about one CPU writing
> half a cache line of the ring buffer in AGP memory, and
> another CPU writing the other half ?

First of all, consider non-SMP. There is no other CPU.
You don't need coherency between the CPU and PCI,
because this isn't memory you'd swap out. Would a
user try to read() into this memory? (I hope not.)
So unless you need eieio to work in this memory,
making it non-coherent should be OK.

Now consider SMP. I'm guessing that this ring
buffer holds commands that are being issued to
the video card. If the kernel writes this data,
then perhaps you should write back and free the
cache line. It's a burst write. Maybe you can
avoid reading that cache line in again if you
can pad out to the end of it with NOPs.

>> No existing PowerPC will do unrequested prefetching
>> across page boundries, or this is easily avoided
>> by not using memory adjacent to the boundry
>> between AGP memory and non-AGP memory.
>
> That isn't a problem, though I'm not sure about your
> statement that they won't do unrequested prefetching.
> Do you have some pointers to the docs ?

This is from David S. Miller and others speculating
on linux-kernel:

"I don't think your PPC case needs the kernel mappings
messed with. I really doubt the PPC will speculatively
fetch/store to a TLB missing address..."

This is about the problem that hit Athlon users:

AGP memory was mapped uncachable.
AGP memory was covered by the normal kernel mapping.
The kernel would write to unrelated nearby memory.
The CPU would speculatively fetch into AGP memory.
(this is a load, to be used for a partial write)
The CPU would mark this cache line dirty.
The cache line is never written to.
The CPU would write out the cache line.

So the CPU ends up reading from AGP memory, and
then writing it back unchanged. Meanwhile, stuff
was written to AGP memory. The CPU is doing
something stupid, but AMD said that they had
every right to do so. Linux mapped the AGP memory
as both cacheable and not, the Athlon followed
the coherency protocol, and as documented the
motherboard didn't bother with coherency.

As far as I can tell, Motorola could claim exactly
the same thing. We BAT-map the AGP memory with
caching enabled, don't we? That's a conflict.

>> If apps would at least avoid reading stuff written
>> by the video card, write-through cached would be OK.
>> Apps that read AGP memory are uncommon enough that
>> fixing all of them would be feasible.
>
> I think we can use full caching (copyback) without too much
> problems. In the r128 case, we'll have to flush from the X server
> as it's directly writing to the ring (and maybe from the mesa driver
> as well). On radeon, it's all done via indirect buffers and those
> get passed to the kernel driver before beeing inserted in the ring.
>
> So we can definitely improve the throughput by letting it be
> cacheable. The main reason I didn't work on this yet is that I want
> the driver to be stable first to avoid possibly mixing problems.
> Currently, I haven't managed to figure out what is causing the
> card lockups when AGP is used.

Maybe the problems will go away if you cache the AGP memory.
It's worth a try, and makes stuff faster anyway.


-- 
To UNSUBSCRIBE, email to debian-powerpc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: