Re: How to tell when a hardware RAID disk goes bad?
Neil Gunton wrote:
Hi all,
I've just had a new server built and installed in a remote datacenter.
It's a Xeon (L5410) running AMD64, with an Adaptec 5805 8-port RAID
card, running 8 SATA 2.5" drives in RAID10. I believe the driver is
aacraid.
Now all of this is very nice, but there's something that has been
bugging me for a while now: How do you tell when one of those drives
goes bad and needs to be replaced? I assume something gets written to
/var/log/messages, but what exactly do I look out for?
I have tried logcheck in the past, but it tends to be very annoying,
throwing up way too many warnings, which necessitates almost constant
tweaking of the filters and rules to try to reduce the noise. It was
irritating, so I stopped using it.
All I want is to know when a drive went bad. All the info out there on
the Web is increasingly useless, since a lot of it is out of date from
2004 or before.
I know that there is an Adaptec RAID management tool in the Debian
repository, named dpt-i2o-raidutils, but this is not an i2o device so I
assume it doesn't apply here.
Any ideas? This seems to be one of those vague areas that nobody talks
about much.
Thanks!
Neil
Update: I asked Adaptec, and got the following response via their ASK
system:
<SNIP>
The normal way if you are accessing remotely the system is to use
Adaptec Storage Manager on the host system, and configure the
notification for any issue. With this option, you can also send an email
to your inbox for the case any issue occur.
If you are not having any graphic interface on your linux system, you
can also use the CLI interface. to send scripts to do on the controller,
like getting the status of the configuration. using Linux commands, it
is then possible to send regularly these scripts and get a status of the
Raid array.
You can get more information to CLI here:
http://download.adaptec.com/pdfs/user_guides/CLI_v5_30_Users_Guide.pdf
</SNIP>
So I went to their website, Support, Downloads, and found the Storage
Manager. Unfortunately it only comes in rpm format, no source as far as
I can see. I downloaded the 64bit version, converted to a debian package
using alien, and installed using dpkg. Eventually, I was able to use
this command to list the status of all the physical devices:
shell> arcconf GETCONFIG 1 PD
I initially had the StorMan daemon also running, but I've found that
arcconf seems to work to do a simple check even without the daemon. I'll
look into whether it's useful to have the daemon running (e.g. to give
me the email notifications automatically).
Neil
Reply to: