[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: checking the release-critical bugs



Phil Oleson writes:
 >  I got this error from milo and
 > normal usage from shells (grabbed from my syslog):
 > 
 > Nov 22 07:40:48 localhost kernel: lca: machine check (la=0xfffffc00002084d0,pc=0xfffffc0000420d78)
 > Nov 22 07:40:48 localhost kernel:   Reason: access to non-existent memory (long frame):
 > Nov 22 07:40:48 localhost kernel:     reason: fffffc02017a7140 exc_addr: fffffc0000420d78  dc_stat: 7
 > Nov 22 07:40:48 localhost kernel:     car: 40441141
 > Nov 22 07:40:48 localhost kernel:     CPU initiated PCI Memory Read cycle to address bd388 failed due to target abort.

I am  not really knowledgeable about this, but as far as I know the
machine_check function is called when a unexpected hardware event
occur. For instance a memory parity error, writing to some inexistant
physical memory, non existant IO memory, and so on....

For you the error seems to occur at pc = 0xfffffc0000420d78, could you 
check what function, and what block code it does correspond to?
The System.map file should give a hint.

As I understand the message, the error reported is that while reading
PCI memory at address 0xbd388 (which is in the normal VGA text mode
range), the target (so probably the video card), has decided to abort
the transaction, (for some reason of its own). Maybe there is some
timings for the DEC implementation of PCI that the video card does not
accept (just in case check it is properly in place in the slot).

Could you tell us the kernel version you are using, and also look if
in all instance of the message "CPU initiated PCI Memory Read cycle to 
address ..." if there is a particular pattern for the address.

Maybe there some setting of the chipset that could give more info, but 
this kind of situation may be due to some sublte hardware
incompatibility and maybe be difficult to diagnose without a PCI logic
analyser. Or maybe if someone has access to the Matrox specs, or is
more knowledgeable about PCI in general, there are some way to know
more about the cause of the abort.

Maybe also using only a smaller part of the video text memory rather
than the standard 32k may help...

Regards,

Loic


Reply to: