[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Software RAID and drive failures



Juhan Kundla said:

> I buy two smaller (and cheaper) IDE disks and use them in RAID-1 array. I
> hope that this gives me good protection against hardware failures. If one
> disk fails, then other will still have my data intact, right? The main
> question is, that how good is the software RAID, when one drive is not
> lost completely, but it starts to have more and more bad blocks? Will the
> RAID-1 protect me from data corruption in that case? Any
> comments?

sort of.  I've only setup a couple raid-1 arrays with IDE disks that I can
think of using software raid. One was using 2x20GB IBM 75GXP disks, the
most unreliable disks available at the moment(in my experience). About
6 months after I installed the array on brand new disks one of them
started to fail. The system was behaving very poorly, as the controller
tried to recover but it was unable to. I think I ended up having to
hit the reset button on the system. After that, I booted the system
in single user mode and disabled the raid so it wouldn't try to use
the 2nd disk anymore(this was on raid 0.36 which is different from
raid 0.90 in 2.4.x and in patched 2.2.x kernels). Rebooted again and
the system ran fine for another few months till the 2nd disk failed.
I probably didn't have to disable the raid but I did anyways. raid 0.90
behaves better then 0.36.

I don't recall losing any data(ext2 filesystem). Though much of the
data was stored on my NFS file server at the time.

So the biggest risk for data curroption is if the system crashes due
to a disk failure. I have had linux systems hobble along with a failed
disk for weeks on end(the system gets incresingly loaded and less responsive),
I've even had a system run when the root disk failed(though logins
were impossible and no new processes would start). But that's the
price you pay. You will not eliminate chances of data currpotion or
data loss, even with hardware raid. You can only reduce it to a
level where the chance of it happening is acceptable(maybe 0.0001%
chance on some higher end systems).

When choosing your IDE disks, research a lot. Many IDE disks are of
very low quality. I have heard that the western digital special edition
drives have a good 3 year warranty. I have had 2 such drives in software
raid 1 for a year and a half now without a single glitch. Of course
the system operates in an ideal enviornment with 90CFM of airflow
going through the case, a sine-wave APC SmartUPS powering it, as
well as a 450watt PC Power & Cooling Power supply. 1 drive per cable,
IDE cables are 18"(longer and your asking for trouble in some systems),
drives are mounted in 5.25" bays without "canisters" for maximum airflow.
System is monitored 24/7.

> I know, i still have to take backups, because the RAID and mirroring won't
> protect me against other types of failures. I was thinking about using a
> separate much bigger IDE disk for backups. If the backup drive would be 7
> times bigger than those smaller disks, then i could take a full backup
> every weekday and have seven copies of my data, every copy taken in
> different time. This gives me maximum one week to react to data loss or
> corruption and if i accidently deleted wrong files, i would could restore
> them from backup, that is not older than 24 hours.

this is a good idea too. You can use rsync with the hard link option
to reduce the amount of space(& time) needed for storing multiple
backups of the same data on the same filesystem(haven't tried this
myself but hear its good).


>
> So what do you think, is my plan plain stupid, or does this really give me
> some protection against data loss. Should i investigate any other
> technologies? EVMS?

I would avoid EVMS, or probably even LVM for a real critical system
the code isn't all that great(from what I've read). Infact the LVM
stuff in 2.4.x is being totally ripped out and replaced from scratch.
If you really need the functionality I guess it wouldn't hurt too
bad but its just one more thing that could go wrong on the box.

your situation sounds doable. Just don't expect to work miracles
on the lowest of the low end. Not sure how much data you have but
9GB scsi disks are quite cheap now(~$50). I picked up two Mylex
Acceleraid 150s with 4MB cache(Ultra 2 SCSI RAID) for $40/ea about
6 months ago, haven't seen any deals since like that. One of them
is in my redhat machine with 5x9GB disks, works great. The other
isn't being used yet.

Don't be suprised if the system crashes when a disk fails. IDE
disk failures tend to wreck havok when they finally go, at least
in my experience(having had about 30 such disks fail in the past
2 years accross about 3 dozen systems).

nate





Reply to: