[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: RAID1 problem - server freezes on md data-check



(Sorry Thomas - hit Reply instead of Reply All by mistake)

> -----Original Message-----
> From: Thomas Goirand [mailto:thomas@goirand.fr]
> Sent: Monday, January 04, 2010 7:34 AM
> To: debian-isp@lists.debian.org
> Subject: Re: RAID1 problem - server freezes on md data-check
> 
> Ross Halliday wrote:
> > Aside from any bugs that checkarray
> > function is definitely a pain on a production system.
> 
> Well, it's even more a pain to have no monthly check at all, and have 
> your drive silently die without a warning. Also, my findings is that 
> most of the time, such lock-up happens only on certain kind of 
> controllers, or with defective (half working) HDD.
> 
> Thomas

Yes and no: as I see it, RAID1 has been less about protecting the data
itself and more a 'hot spare' idea so that if one disk bites the dust
there is instantaneous failover. It's a very basic design and I would
say holds true to its name: "Redundant Array of Independent Disks".
Technologies like RAID5 which have parity checking will tell you the
instant one disk is behaving badly and kill it from the array - this is
more suited to protecting against partial failures and data corruption.

I have to seriously question the value of this once-a-month check as the
other 27-30 days of the month your disk could be half-dead, spewing
corrupt data and you'd never know until it was Sunday at 1:06 AM. It
seems like a sort of after-thought hack that renders your disks unusable
for a few hours. It would make more sense if the check was run nightly,
but you would probably see a lot of upset people complaining that
backups take forever and after-hours performance would take a massive
hit. Perhaps there is some other way to check data integrity more than
approximately once per full moon that doesn't destroy I/O performance?

I'm not trying to start a war or anything, and I apologize if I sound
like a complete idiot. Of course any corrections are welcome. However
the above is my opinion built on my knowledge of RAID systems, some of
it probably inaccurate as I am just a lowly sys admin :)


Cheers

---
Ross Halliday
Network Operations
WTC Communications




Reply to: