[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to tell when a hardware RAID disk goes bad?



is there a user space daemon that monitors the device?

we use one for the hp (nee compaq) array controllers, called cpqarrayd

its decent enough, just runs in the background, sends SYSLOG and or snmp traps as configured.

from a quick apt-cache search i see there is a program called array-info which gives info about various arrays.

is there anything in /proc that gives you the drive status?

something you can regex with a cron job and send alerts of some description if there is a failure?

Dean

Neil Gunton wrote:
Hi all,

I've just had a new server built and installed in a remote datacenter. It's a Xeon (L5410) running AMD64, with an Adaptec 5805 8-port RAID card, running 8 SATA 2.5" drives in RAID10. I believe the driver is aacraid.

Now all of this is very nice, but there's something that has been bugging me for a while now: How do you tell when one of those drives goes bad and needs to be replaced? I assume something gets written to /var/log/messages, but what exactly do I look out for?

I have tried logcheck in the past, but it tends to be very annoying, throwing up way too many warnings, which necessitates almost constant tweaking of the filters and rules to try to reduce the noise. It was irritating, so I stopped using it.

All I want is to know when a drive went bad. All the info out there on the Web is increasingly useless, since a lot of it is out of date from 2004 or before.

I know that there is an Adaptec RAID management tool in the Debian repository, named dpt-i2o-raidutils, but this is not an i2o device so I assume it doesn't apply here.

Any ideas? This seems to be one of those vague areas that nobody talks about much.

Thanks!

Neil




Reply to: