[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: backup archive format saved to disk



Andrew Sackville-West wrote:
> if the chance of a disk failure is (say) 1% in the time alloted, then
> the chance of having a failure with disks is 2%. THe change of any one
> particular disk failing is still 1%, it the odds of A failure in the
> system as a whole that goes up. So with more disks you're more likely
> to have failures of some kind, but the per disk failure stays the same
> and the odds of losing ALL of them goes the other way. The odds of
> losing BOTH disks is .1%. the question becomes, which one has
> failed...

...the one with the bad blocks, I would guess. I would just diff the
disks and look at the files that differ manually. Opening your data with
the appropriate application (be it jpgs, txt or office documents....)
will usually tell you which one is the damaged file at a glance. (Of
course, one could also save md5 sums or the like.)  I would guess that
the case, where one would get more than a handful of different files
from both, and at the same time both disks are damaged and partially
accessible is extremely unlikely. Most of the damaged disks I have seen
so far did not work at all, so being lucky enough that both disks fail
gracefully seems very unlikely to me.

If I put the same disks in a bank vault for years, I guess the
probability for the first few months will be 1 % for each. After a long
enough time closer to the end of the lifetime of the disks the situation
may be different and each disk has a failure probability of, say 50%. So
in 25 % of all cases you loose data. (If you leave them in the vault for
longer this risk will not only get close to 100%, sooner or later it
will be straight at 100%, whenever that will be.)

In order to increase your chances of at least one surviving disk, it is
essential that you check your disks regularly and replace the failing
one before the second one fails. Ultimately, the disk will die of things
like that the polymers degrade of age which hold the magnetic particles
with your data in place, or some degrading glue inside the housing that
keeps important parts together, or some other 'old age' related problem.

So you have to check your disks in much shorter intervals than the
average lifetime of a disk.

Since you have to check your disks regularly anyway, I think the better
strategy is to rotate backups, while you are at checking your disks
anyway. If there's an undetected error from the backup process (or
something like an undetected damaged file from your original data) you
have the chance to revert to the previous backup. Bad blocks are not the
only way to loose data/backups.

Just my 2ct.

Johannes



Reply to: