[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Fwd: newb question : list of elligible computers to debian-68k

On Mon, 20 Oct 2008, Michael Schmitz wrote:

> Hi,
> First off - good work on getting this fault detected by the kernel! 

Thanks :-)

It was an interesting exercise to try to write code that's supposed to 
fail reliably!

> Maybe we can find a way to work around it as well.
> > > I wonder why. Is it not possible to distinguish a false ATC fault 
> > > from an unimplemented FP instruction exception?
> The ATC fault isn't false, it is rather premature, caused by concurrent 
> prefetch going on while the processor throws the F-line exception. We 
> would have to deal with it later on anyway.

I think the page fault normally precedes the F-line exception but when the 
FPU op spans the page boundary, the F-line exception goes missing... but I 
could be wrong.

> > We definitely know what type of fault happened. The only reason I can 
> > come up with that the A-line trap would be easier to fixup is that the 
> > exception frame for that is simpler than the unimplemented FP 
> > instruction exception frame. However, it's possible that Apple just 
> > didn't fix it because they didn't care. They recommended that any 
> > software detect the presence of the FPU and only use real FP opcodes 
> > if the chip had real hardware. There was a software floating point 
> > library that was to be used instead of opcode emulation.

Yes, it occurred to me also that the SANE library might have given them a 
good excuse not to attempt it.

> > I guess what Apple does is detect this special case of ATC fault, then 
> > if the bad instruction is one it knows how to fixup it manually 
> > creates the instruction emulation stack frame and does the emulation 
> > followed by the page fault code for the missing page. I believe an ATC 
> > fault would normally just cause the next page to be loaded.
> That's what I have been thinking - is there a way to fabricate the 
> F-line exception stack from what is available on the stack at the time 
> the page fault was taken? (From the errata, it appears as if the PC is 
> still pointing to the faulting instruction... maybe that can be used for 
> a quick check in the page fault handler? What is the situation with a 
> plain page fault?)

>From the M68040 User's Manual, p. 8-7,

"[The processor] saves the vector offset, PC, and internal copy of the SR 
on the stack. The saved PC value is the logical address of the instruction 
executing at the time the fault was detected. This instruction is not 
necessarily the one that initiated the bus cycle since the processor 
overlaps execution of instructions."

According to the erratum, in this case the access error exception frame 
will show the fault address for the page fault (i.e. start of following 
page), rather than the fault address for the unimplemented instrution. And 
the saved PC will indicate the unimplemented instruction, whereas for a 
normal access error it would vary.

So you have to figure out, for an ATC exception with a fault address at 
the start of a page, whether an instruction at the page boundary might 
have (1) been executed prior to the ATC exception (difficult when you 
can't rely on the saved PC?) and (2) is an FPU op in need of emulation and 
(3) there is no pending exception for this unimplemented instruction 
(since the bug doesn't guarantee that the second exception goes missing -- 
at least that's how it would appear from my test code...)

> Is the FPU instruction the very last on the page, or would there be 
> another one following it, potentially? Can we distinguish between this 
> situation and one where we just had random data at the end of the page 
> right before the faulting instruction (not that this should ever happen, 
> I guess)? Are FPU instructions ever executed from the stack?

I think this is the real problem... unless we know that this is only a 
prefetch issue.

> > That should work for investigation. Ideally we would have a runtime 
> > test that we could add to the kernel the way the x86 kernel tests for 
> > all the various intel bugs like the F00F bug and the Pentium fdiv bug.
> >
> > I think the demand paging is one page at a time, but I don't 
> > understand the memory management code well enough to be sure. Perhaps 
> > someone else can comment on that.
> Regardless of how many pages you page in at a time, you can still end up 
> with the situation that the last page has a potential FPU instruction at 
> the very end. What are you going to do in that case - take another page 
> fault, from within the page fault handler after checking the last page?

The number of pages mapped at once was relevant to coding my test case, 
but yes, it is not relevant to a workaround ... unless you always check 
the last few bytes of any pages that get mapped during demand paging (if 
executable) in case they might cause a page fault during prefetch (if 
prefetch is in fact critical here) during an unimplemented FPU op. And if 
that can happen, you map an extra page (and check again).

And then I suppose you have to complicate page eviction with the same test 
(i.e. evict the previous page until all the resident ones are safe).


> 	Michael

Reply to: