[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

machine checks on Dell R815 under jessie



I upgraded four Dell R815s from wheezy to jessie a few weeks ago. Prior to the
upgrade, they were running reliably for about 5 years. Since the upgrade, two
machines have been getting periodic machine checks. The machines boot fine and
run for a day or more. The machine checks appear to happen sporadically. I
can't determine a correlation with anything in particular.

The front panel on the first machine says the machine check was on CPU #4. The
front panel on the second machine said the first machine check was on CPU #1
and the second machine check was on CPU #2.

I am suspicious that this is really a hardware problem. Three CPUs begin
exhibiting machine checks within a few weeks of each other, all immediately
after upgrading wheezy to jessie, after working reliably for five years.

Has anybody else encountered this issue? Any suggestions on how to debug and
fix?

    Thanks,
    Jeff (http://engineering.purdue.edu/~qobi)
-------------------------------------------------------------------------------
root@arivu:~# ipmitool sel elist
   1 | 08/05/2016 | 00:12:47 | Event Logging Disabled SEL | Log area reset/cleared | Asserted
   2 | 08/06/2016 | 11:35:17 | Processor CPU Machine Chk | Transition to Non-recoverable | Asserted
   3 | 08/06/2016 | 11:35:17 | Unknown #0x28 |  | Asserted
   4 | 08/06/2016 | 11:35:18 | Unknown #0x28 |  | Asserted
   5 | 08/06/2016 | 11:35:18 | Unknown #0x28 |  | Asserted
   6 | 08/06/2016 | 11:35:18 | Unknown #0x28 |  | Asserted
   7 | 08/06/2016 | 11:35:18 | Unknown #0x28 |  | Asserted
   8 | 08/06/2016 | 11:35:19 | Unknown #0x28 |  | Asserted
   9 | 08/06/2016 | 11:35:19 | Unknown #0x28 |  | Asserted
   a | 08/06/2016 | 11:35:19 | Unknown #0x28 |  | Asserted
root@arivu:~# 

root@perisikan:~# ipmitool sel elist
[...]
  1c | 08/08/2016 | 12:23:02 | Processor CPU Machine Chk | Transition to Non-recoverable | Asserted
  1d | 08/08/2016 | 12:23:03 | Unknown #0x28 |  | Asserted
  1e | 08/08/2016 | 12:23:03 | Unknown #0x28 |  | Asserted
  1f | 08/08/2016 | 12:23:03 | Unknown #0x28 |  | Asserted
  20 | 08/08/2016 | 12:23:03 | Unknown #0x28 |  | Asserted
  21 | 08/08/2016 | 12:23:03 | Unknown #0x28 |  | Asserted
  22 | 08/08/2016 | 12:23:04 | Unknown #0x28 |  | Asserted
  23 | 08/08/2016 | 12:23:04 | Unknown #0x28 |  | Asserted
  24 | 08/08/2016 | 12:23:04 | Unknown #0x28 |  | Asserted
  25 | 08/09/2016 | 18:37:46 | Processor CPU Machine Chk | Transition to Non-recoverable | Asserted
  26 | 08/09/2016 | 18:37:46 | Unknown #0x28 |  | Asserted
  27 | 08/09/2016 | 18:37:47 | Unknown #0x28 |  | Asserted
  28 | 08/09/2016 | 18:37:47 | Unknown #0x28 |  | Asserted
  29 | 08/09/2016 | 18:37:47 | Unknown #0x28 |  | Asserted
  2a | 08/09/2016 | 18:37:47 | Unknown #0x28 |  | Asserted
  2b | 08/09/2016 | 18:37:48 | Unknown #0x28 |  | Asserted
  2c | 08/09/2016 | 18:37:48 | Unknown #0x28 |  | Asserted
  2d | 08/09/2016 | 18:37:48 | Unknown #0x28 |  | Asserted
root@perisikan:~#


Reply to: