[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Test ECC memory



Anssi Saari wrote: 
> Dan Ritter <dsr@randomstring.org> writes:
> 
> > We see ECC errors irregularly and infrequently on both Intel and
> > AMD CPUs.
> 
> How/where do you see those on a Debian system? I looked into this
> briefly but didn't get anywhere.


The kernel announces readiness during boot with:
dmesg:[   18.331561] EDAC amd64: Node 0: DRAM ECC enabled.

and then an event looks like this:
Message from syslogd@HOSTNAME at Jan 25 15:05:51 ...
kernel:[5964975.397283] [Hardware Error]: Corrected error, no
action required.

Message from syslogd@HOSTNAME at Jan 25 15:05:51 ...
kernel:[5964975.406226] [Hardware Error]: CPU:0 (15:2:0)
MC4_STATUS[-|CE|MiscV|-|AddrV|-|-|CECC]: 0x9c04400040080a13

Message from syslogd@HOSTNAME at Jan 25 15:05:51 ...
kernel:[5964975.418574] [Hardware Error]: Error Addr:
0x0000001ed405ef50

Message from syslogd@HOSTNAME at Jan 25 15:05:51 ...
kernel:[5964975.426919] [Hardware Error]: MC4 Error (node 0):
DRAM ECC error detected on the NB.

Message from syslogd@HOSTNAME at Jan 25 15:05:51 ...
kernel:[5964975.437370] [Hardware Error]: cache level: L3/GEN,
mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)


If you see a bunch of these, you want to install edac-utils and
run it to see if you have a bad DIMM.

-dsr-


Reply to: