Re: backup archive format saved to disk
On Wed, 06 Dec 2006 15:40:33 +0100, hendrik wrote:
>
> If you want to be able to recover data despite damage, it is in general
> not wise to compress it, since different parts will be damaged
> independently, and the undamaged parts will still be readable.
> Squeezing out redundancy makes different parts of the data dependent on
> one another for interpretation.
No, you _should_ compress it and then use some of the space you saved to
add some carefully chosen redundancy which will allow you to reconstruct
everything, not just some things, in case of failure. (E.g., using par2.)
Scenario A: Compression
Suppose you have 100 megabytes of files, uncompressed. You create a tar
archive and compress it down to 75M. A failure occurs, and 2M of data
are lost. The archive becomes impossible to decompress, and you lose
everything. You are very sad.
Scenario B: No compression
Suppose you have 100 megabytes of files, uncompressed. A failure occurs,
and 2M of data is lost. All files intersecting the broken region are
destroyed (modulo any Herculean effort one is willing to put into
reconstruction). You are sad, but not as sad as Scenario A.
Scenario C: Compression plus redundancy
Suppose you have 100 megabytes of files, uncompressed. You create a tar
archive and compress it down to 75M. You then create 10M of redundancy
using (e.g.) par2, for a total of 85M. A failure occurs, and 2M of data
is lost. You use par2 to reconstruct the archive, and nothing is lost.
(You can do this regardless of whether data, redundancy, or both are
destroyed.) You are happy.
HTH,
Reid
Reply to: