Re: Please help - kernel crashes often

Yes, I was able to go down and get on the console, record it, and
found a thread on how to decypher it.

The MCE was:

CPU 0: Machine Check Exception:         4 Bank 0: f60da00000000833
TSC 23fd7acec1e ADDR 797db2c0
Kernel panic - not syncing: Machine check

the output from "mcelog" was:

web03:~# mcelog --k8 --ascii <mce.txt
CPU 0 0 data cache TSC 23fd7acec1e
  Data cache ECC error (syndrome 1b)
       bit45 = uncorrected ecc error
       bit57 = processor context corrupt
       bit61 = error uncorrected
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      data read mem transaction
      memory access, level generic'
STATUS f60da00000000833 MCGSTATUS 4
Kernel panic - not syncing: Machine check

I've been running memtest86 V3.3 (if I recall the exact title) on all
the machines starting earlier today and will be looking at them in the
next day or two to figure out what they say.

One thing that disturbs me is that it shows ECC: no in memtest, even
when I force enable it on - and the RAM is most definately ECC...

On 1/30/06, Anthony DeRobertis <anthony@derobert.net> wrote:
> ECC failures will generate MCE's. The MCE message *should* provide some
> hint as to what is wrong.

