Hi,First off - good work on getting this fault detected by the kernel! Maybe we can find a way to work around it as well.
I wonder why. Is it not possible to distinguish a false ATC fault from an unimplemented FP instruction exception?
The ATC fault isn't false, it is rather premature, caused by concurrent prefetch going on while the processor throws the F-line exception. We would have to deal with it later on anyway.
We definitely know what type of fault happened. The only reason I can come up with that the A-line trap would be easier to fixup is that the exception frame for that is simpler than the unimplemented FP instruction exception frame. However, it's possible that Apple just didn't fix it because they didn't care. They recommended that any software detect the presence of the FPU and only use real FP opcodes if the chip had real hardware. There was a software floating point library that was to be used instead of opcode emulation. I guess what Apple does is detect this special case of ATC fault, then if the bad instruction is one it knows how to fixup it manually creates the instruction emulation stack frame and does the emulation followed by the page fault code for the missing page. I believe an ATC fault would normally just cause the next page to be loaded.
That's what I have been thinking - is there a way to fabricate the F-line exception stack from what is available on the stack at the time the page fault was taken? (From the errata, it appears as if the PC is still pointing to the faulting instruction... maybe that can be used for a quick check in the page fault handler? What is the situation with a plain page fault?)
Is the FPU instruction the very last on the page, or would there be another one following it, potentially? Can we distinguish between this situation and one where we just had random data at the end of the page right before the faulting instruction (not that this should ever happen, I guess)? Are FPU instructions ever executed from the stack?
That should work for investigation. Ideally we would have a runtime test that we could add to the kernel the way the x86 kernel tests for all the various intel bugs like the F00F bug and the Pentium fdiv bug. I think the demand paging is one page at a time, but I don't understand the memory management code well enough to be sure. Perhaps someone else can comment on that.
Regardless of how many pages you page in at a time, you can still end up with the situation that the last page has a potential FPU instruction at the very end. What are you going to do in that case - take another page fault, from within the page fault handler after checking the last page?
Michael