[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Please help - kernel crashes often



Yeah, I looked at the memory. It's got a PQI sticker (at least one set)

It's SUPPOSED to be "(Samsung, Micron, Elpida, Infineon, Hynix OEM)" -
which would align it basically with what Supermicro suggests:

http://supermicro.com/Aplus/support/resources/memory/?sz=1.0&mspd=0.4&mtyp=9&id=51EF70624CA791283EC434A52DA0D4E2

Anyway, I called Supermicro. I'm going to order their
recommended/proper heatsink, air shroud, and then also call up the
vendor I got the RAM from and tell them they did not deliver the
proper stuff. They'll put up a fight, because they don't do good
business - so tomorrow looks to be fun.

Hopefully between those two any cooling and any RAM issues will be out
of the equation.

On 2/1/06, Paul Brook <paul@codesourcery.com> wrote:
> On Wednesday 01 February 2006 16:47, mike wrote:
> > After running memtest86 (V3.3) for at least 24 hours, I came back and
> > saw that each machine completed 61-63 cycles of tests, with 0
> > errors...
> >
> > However, I did look through the BIOS for cache disabling - and it
> > doesn't appear I can disable the CPU cache.
> >
> > I did turn on chipkill and some other supposed ECC memory "helpers"
> > and instantly had the machine crash twice.
> >
> > [root@lvs01 ~]# mcelog --k8 --ascii <mce2.txt
> > CPU 0 4 northbridge TSC 2
> >   Northbridge Chipkill ECC error
> >   Chipkill ECC syndrome = 6ca0
> >        bit32 = err cpu0
> >        bit45 = uncorrected ecc error
> >        bit57 = processor context corrupt
> >        bit61 = error uncorrected
> >   bus error 'local node origin, request didn't time out
> >       generic read mem transaction
> >       memory access, level generic'
> > STATUS b65020016c080813 MCGSTATUS 4
> > 332ff8453 ADDR 7ff5faf0
> > Kernel panic - not syncing: Machine check
>
> I had something similar, and it turned out the motherboard just didn't like
> the brand/model of memory I was using. Replacing it with a different make
> (this time one that was on the motherboard's recommended list) fixed the
> problem.
>
> Paul
>



Reply to: