[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: weird mdadm crash



On Thu, Mar 08, 2007 at 08:30:57AM -0800, michael wrote:
> Hello,
> 
> Have an etch box that does nothing but rsync data with another.
> About every other day or so, the box will completely freeze.
> Everything, screen blank, no keyboard, and the hard drive light
> is on solid.
> I can hard reboot it and it comes up, and there is nothing in the logs that
> suggest anything.
> The root system is an mdadm raid 5 array, and everytime I reboot
> it from a crash, the array is always degraded. It auto rebuilds itself,
> and away it goes again. A few days later, it will lock up.
> 
> I have no idea where to start looking for problems. I'm pretty sure its gotta
> be hardware, but not sure where to look first.
> Any suggestions would be great!
> 

AIUI, the order of mostly likely-to-least likely failure is:
power-supply, hard-drives, memory, other stuff.

power-supplies are hard to test without equipment, unless you know
you've got sensors set up properly. But, its still worth a shot -- set
up lmsensors and look at your voltages. If they're more than +/- 5%
from spec then start with a new power-supply. Hard-drives should
generally leave some kind of logs right before they go down, and with
raid, you shouldn't see a lock-up, unless you're sharing controllers,
maybe. If the drives are SMART enabled, then check that out. I think
memory errors are pretty much impossible to diagnose through any
method other than swapping sticks in a systematic way.

good luck

A

Attachment: signature.asc
Description: Digital signature


Reply to: