On Sat, 2006-04-08 at 12:55 +1000, Paul Mackerras wrote:
This patch fixes it for me on my powerbook (1.5GHz albook). The
issue
seems to be that the cpu objects to HID0_NAP being cleared in HID0.
If I have this code power_save_6xx_restore, it hangs:
_GLOBAL(power_save_6xx_restore)
mfspr r11,SPRN_HID0
rlwinm r11,r11,0,10,8 /* Clear NAP */
mtspr SPRN_HID0,r11
b transfer_to_handler_cont
If I take out that rlwinm, it boots. Bizaare.
If you do that, you cause the transfer_to_handler to always call
power_save_6xx_restore even when not coming back from idle.
I did a bit more tracking and it's very strange.... At first, I
discovered that adding a printk after the call to power_save fixed
it. I
did all sort of tests and if my memory serves me well, a simple mb()
there will fix it too. In fact, what I noticed is that if I do
if (mfmsr() & MSR_POW)
printk("GACK !\n");
After calling ppc_md.power_save() and before local_irq_enable(), it
does
trigger ! But with an mb() just before, it doesn't. In fact, you don't
need an mb()... all you need is another mfmsr(). Thus a dummy msmsr()
"fixes" the stale MSR_POW in there.
That is very dodgy. Looks like we get a stale MSR_POW upon return from
the exception that interrupted sleep, causing the next
local_irq_enable() to block forever.
The next question is how does it get there... my idea at first was
that
we get MSR_POW in SRR1 in that exception and put it back in with rfi
(and the CPU gets it that way instead of ignoring it). Sounds like a
lovely explanation if we also add that a sync or an mfmsr "clears"
this
weird condition. However, I added clearing of MSR_POW in r9 in
EXCEPTION_PROLOG_2() and it didn't fix it (but maybe I did something
wrong, I was tired).