Re: How to tell when a hardware RAID disk goes bad?
is there a user space daemon that monitors the device?
we use one for the hp (nee compaq) array controllers, called cpqarrayd
its decent enough, just runs in the background, sends SYSLOG and or snmp
traps as configured.
from a quick apt-cache search i see there is a program called array-info
which gives info about various arrays.
is there anything in /proc that gives you the drive status?
something you can regex with a cron job and send alerts of some
description if there is a failure?
Dean
Neil Gunton wrote:
Hi all,
I've just had a new server built and installed in a remote datacenter.
It's a Xeon (L5410) running AMD64, with an Adaptec 5805 8-port RAID
card, running 8 SATA 2.5" drives in RAID10. I believe the driver is
aacraid.
Now all of this is very nice, but there's something that has been
bugging me for a while now: How do you tell when one of those drives
goes bad and needs to be replaced? I assume something gets written to
/var/log/messages, but what exactly do I look out for?
I have tried logcheck in the past, but it tends to be very annoying,
throwing up way too many warnings, which necessitates almost constant
tweaking of the filters and rules to try to reduce the noise. It was
irritating, so I stopped using it.
All I want is to know when a drive went bad. All the info out there on
the Web is increasingly useless, since a lot of it is out of date from
2004 or before.
I know that there is an Adaptec RAID management tool in the Debian
repository, named dpt-i2o-raidutils, but this is not an i2o device so I
assume it doesn't apply here.
Any ideas? This seems to be one of those vague areas that nobody talks
about much.
Thanks!
Neil
Reply to: