[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to interprete machine check exception



Just in case someone else searches the mailing lists...

Ralf Schmitt wrote:
Hi,

One of out our Opteron based machines here at work keeps crashing (with kernel 2.6.8/2.6.10). Last thing it prints on the console is (hand transcript):

CPU0: Machine Check Exception       4 Bank 0: b60ea00000000833
TSC 6e5cd030ae71
ADDR 258f8640

I've downloaded parsemce.c 0.0.9 from http://codemonkey.org.uk/cruft/. But I'm not sure about the correct way to call it (or if it even works for amd64).

still don't know. but one can use the recently released mcelog 0.3:

ralf@dumbo:~$ cat mce.txt
CPU 0: Machine Check Exception:                4 Bank 0: b60ea00000000833
TSC 3151881f80cc
Kernel panic - not syncing: Machine check
ralf@dumbo:~$ /usr/sbin/mcelog --k8 --ascii  <mce.txt
CPU 0 0 data cache TSC 3151881f80cc
  Data cache ECC error (syndrome 1d)
       bit45 = uncorrected ecc error
       bit57 = processor context corrupt
       bit61 = error uncorrected
  bus error 'local node origin, request didn't time out
      data read mem transaction
      memory access, level generic'
STATUS b60ea00000000833 MCGSTATUS 4
Kernel panic - not syncing: Machine check


- Ralf



Reply to: