Re: Please help - kernel crashes often
Yeah, I looked at the memory. It's got a PQI sticker (at least one set)
It's SUPPOSED to be "(Samsung, Micron, Elpida, Infineon, Hynix OEM)" -
which would align it basically with what Supermicro suggests:
Anyway, I called Supermicro. I'm going to order their
recommended/proper heatsink, air shroud, and then also call up the
vendor I got the RAM from and tell them they did not deliver the
proper stuff. They'll put up a fight, because they don't do good
business - so tomorrow looks to be fun.
Hopefully between those two any cooling and any RAM issues will be out
of the equation.
On 2/1/06, Paul Brook <firstname.lastname@example.org> wrote:
> On Wednesday 01 February 2006 16:47, mike wrote:
> > After running memtest86 (V3.3) for at least 24 hours, I came back and
> > saw that each machine completed 61-63 cycles of tests, with 0
> > errors...
> > However, I did look through the BIOS for cache disabling - and it
> > doesn't appear I can disable the CPU cache.
> > I did turn on chipkill and some other supposed ECC memory "helpers"
> > and instantly had the machine crash twice.
> > [root@lvs01 ~]# mcelog --k8 --ascii <mce2.txt
> > CPU 0 4 northbridge TSC 2
> > Northbridge Chipkill ECC error
> > Chipkill ECC syndrome = 6ca0
> > bit32 = err cpu0
> > bit45 = uncorrected ecc error
> > bit57 = processor context corrupt
> > bit61 = error uncorrected
> > bus error 'local node origin, request didn't time out
> > generic read mem transaction
> > memory access, level generic'
> > STATUS b65020016c080813 MCGSTATUS 4
> > 332ff8453 ADDR 7ff5faf0
> > Kernel panic - not syncing: Machine check
> I had something similar, and it turned out the motherboard just didn't like
> the brand/model of memory I was using. Replacing it with a different make
> (this time one that was on the motherboard's recommended list) fixed the