[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#637190: linux-image-2.6.32-5-amd64: Random kernel panics & general protection faults



On Tue, 2011-08-09 at 14:55 +0200, Simon Morvan wrote:
> Le 09/08/2011 14:50, Ben Hutchings a écrit :
> > On Tue, 2011-08-09 at 12:22 +0200, Simon Morvan wrote:
> >> Package: linux-2.6
> >> Version: 2.6.32-35
> >> Severity: grave
> >> Justification: renders package unusable
> >>
> >>
> >> This system is standard PC loaded with a bunch of SATA disks (15).
> >> 8 on a LSI raid card
> >> the remaining on standard SATA port on the motherboard.
> >> This is primarily a NAS for the LAN (Samba&  netatalk).
> >>
> >> We're getting random crash of the system (panics, GPF). Stack trace is always different.
> > Can you check that the power supply is sufficient for all these disks?
> Do you have recommendations ? I haven't found so much information on how 
> to estimate the power need. Currently this is a 600W power supply (FWIW: 
> Cooler Master Silent Pro M - 600W)

Many motherboards have a voltage monitoring chip, which you should be
able to read with the 'sensors' command from the 'lm-sensors' package.
This should show whether the actual voltages are being pulled down
because the power supply is overloaded.  You would need to actually make
all the hard drives active while checking this.

The HD specifications should also state the maximum current they require
on each of the power rails (+5, +12, -12).  The power supply
specifications should state the maximum current it can deliver on those
rails (there is a limit per rail, separate from the 600W total maximum
power).  You obviously have various other devices drawing power, but
that should give you some idea of whether this is likely to be a
problem.

> >
> > [...]
> >> [    5.088841] EDAC amd64: This node reports that Memory ECC is currently disabled, set F3x44[22] (0000:00:18.3).
> >> [    5.088885] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
> >> [    5.088886]  Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
> >> [    5.088887]  (Note that use of the override may cause unknown side effects.)
> >> [    5.088978] amd64_edac: probe of 0000:00:18.2 failed with error -22
> > [...]
> >
> > It would also be sensible to enable ECC on such an important machine.
> This requires specific RAM chips, does it ?

It requires ECC memory modules, yes.

Ben.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: