[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: backup archive format saved to disk



On Wed, Dec 06, 2006 at 01:11:06PM -0600, Mike McCarty wrote:
> Andrew Sackville-West wrote:
> >On Wed, Dec 06, 2006 at 02:52:29PM +0100, Johannes Wiedersich wrote:
> >
> >>Question: how likely is it that both disks develop bad blocks, while
> >>none of them is damaged? I'm no expert on this, but I guess a better
> >>strategy might be to rotate backups on two disks, and use (and check:
> >>fsck and smartctl) them reguarly.
> >
> >if the chance of a disk failure is (say) 1% in the time alloted, then
> >the chance of having a failure with disks is 2%. THe change of any one
> 
> I don't follow this reasoning. Are you presuming independence of the
> failures and identical probabilities? If so, then this is the way to
> compute it:
> 
> Let p be the probability of failure of each disc, independently of the
> other. There are four mutually independent events which comprise the
> space. Both discs may fail [Pr = p^2]. The first disc may fail, while
> the second does not [Pr = p(1-p)]. The second disc may fail, while the
> first does not [Pr = (1-p)p]. Both discs may survive [Pr = (1-p)(1-p)].
> 
> So, the probability that at least one disc fails is 1-(1-p)(1-p).
> For p = 0.01, that is 0.0199.
> 
> I'll grant you this is not markedly different from 2%, but it is also
> not simply 2p.
> 
> >particular disk failing is still 1%, it the odds of A failure in the
> >system as a whole that goes up. So with more disks you're more likely
> >to have failures of some kind, but the per disk failure stays the same
> >and the odds of losing ALL of them goes the other way. The odds of
> >losing BOTH disks is .1%. the question becomes, which one has
> >failed...
> 
> I don't follow this reasoning. The probability of both discs failing
> (if they do so independently) is not 0.1%, but rather 0.01%. A partially
> failed disc is usually easy to detect, since they have FEC on them. A
> completely failed disc is even easier to detect :-)
> 

Mike, 

Without expending any mathematical energy, could you recompute your two
probabilities based on a set of three disks instead of 2?  I'm guessing
that the probability of one disk failing goes up but the probability of
all three failing drops substantially (the famious tripple-redundancy
theory).

I'm assuming that a partialy failed disk will return good data (because
of the FEC) and that an error notice ends up in syslog (do you know the
severity)?  

How does a raid1 array handle a partially failing disk?  Does it just
take the good data and carry on until the drive completly fails or does
mdadm also get involved in issuing a warning of a failing drive?

Thanks,

Doug.



Reply to: