[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to tell when a hardware RAID disk goes bad?



Neil Gunton wrote:
Hi all,

I've just had a new server built and installed in a remote datacenter. It's a Xeon (L5410) running AMD64, with an Adaptec 5805 8-port RAID card, running 8 SATA 2.5" drives in RAID10. I believe the driver is aacraid.

Now all of this is very nice, but there's something that has been bugging me for a while now: How do you tell when one of those drives goes bad and needs to be replaced? I assume something gets written to /var/log/messages, but what exactly do I look out for?

I have tried logcheck in the past, but it tends to be very annoying, throwing up way too many warnings, which necessitates almost constant tweaking of the filters and rules to try to reduce the noise. It was irritating, so I stopped using it.

All I want is to know when a drive went bad. All the info out there on the Web is increasingly useless, since a lot of it is out of date from 2004 or before.

I know that there is an Adaptec RAID management tool in the Debian repository, named dpt-i2o-raidutils, but this is not an i2o device so I assume it doesn't apply here.

Any ideas? This seems to be one of those vague areas that nobody talks about much.

Thanks!

Neil



Update: I asked Adaptec, and got the following response via their ASK system:

<SNIP>
The normal way if you are accessing remotely the system is to use Adaptec Storage Manager on the host system, and configure the notification for any issue. With this option, you can also send an email to your inbox for the case any issue occur. If you are not having any graphic interface on your linux system, you can also use the CLI interface. to send scripts to do on the controller, like getting the status of the configuration. using Linux commands, it is then possible to send regularly these scripts and get a status of the Raid array.
You can get more information to CLI here:
http://download.adaptec.com/pdfs/user_guides/CLI_v5_30_Users_Guide.pdf
</SNIP>

So I went to their website, Support, Downloads, and found the Storage Manager. Unfortunately it only comes in rpm format, no source as far as I can see. I downloaded the 64bit version, converted to a debian package using alien, and installed using dpkg. Eventually, I was able to use this command to list the status of all the physical devices:

shell> arcconf GETCONFIG 1 PD

I initially had the StorMan daemon also running, but I've found that arcconf seems to work to do a simple check even without the daemon. I'll look into whether it's useful to have the daemon running (e.g. to give me the email notifications automatically).

Neil


Reply to: