Re: How to interprete machine check exception
Just in case someone else searches the mailing lists...
Ralf Schmitt wrote:
Hi,
One of out our Opteron based machines here at work keeps crashing (with
kernel 2.6.8/2.6.10). Last thing it prints on the console is (hand
transcript):
CPU0: Machine Check Exception 4 Bank 0: b60ea00000000833
TSC 6e5cd030ae71
ADDR 258f8640
I've downloaded parsemce.c 0.0.9 from http://codemonkey.org.uk/cruft/.
But I'm not sure about the correct way to call it (or if it even works
for amd64).
still don't know. but one can use the recently released mcelog 0.3:
ralf@dumbo:~$ cat mce.txt
CPU 0: Machine Check Exception: 4 Bank 0: b60ea00000000833
TSC 3151881f80cc
Kernel panic - not syncing: Machine check
ralf@dumbo:~$ /usr/sbin/mcelog --k8 --ascii <mce.txt
CPU 0 0 data cache TSC 3151881f80cc
Data cache ECC error (syndrome 1d)
bit45 = uncorrected ecc error
bit57 = processor context corrupt
bit61 = error uncorrected
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS b60ea00000000833 MCGSTATUS 4
Kernel panic - not syncing: Machine check
- Ralf
Reply to: