Re: backup archive format saved to disk
On Wed, 06 Dec 2006 15:40:33 +0100, hendrik wrote:
> 
> If you want to be able to recover data despite damage, it is in general 
> not wise to compress it, since different parts will be damaged 
> independently, and the undamaged parts will still be readable.  
> Squeezing out redundancy makes different parts of the data dependent on 
> one another for interpretation.
No, you _should_ compress it and then use some of the space you saved to
add some carefully chosen redundancy which will allow you to reconstruct
everything, not just some things, in case of failure. (E.g., using par2.)
Scenario A: Compression
  Suppose you have 100 megabytes of files, uncompressed. You create a tar
  archive and compress it down to 75M. A failure occurs, and 2M of data
  are lost. The archive becomes impossible to decompress, and you lose
  everything. You are very sad.
Scenario B: No compression
  Suppose you have 100 megabytes of files, uncompressed. A failure occurs,
  and 2M of data is lost. All files intersecting the broken region are
  destroyed (modulo any Herculean effort one is willing to put into
  reconstruction). You are sad, but not as sad as Scenario A.
Scenario C: Compression plus redundancy
  Suppose you have 100 megabytes of files, uncompressed. You create a tar
  archive and compress it down to 75M. You then create 10M of redundancy
  using (e.g.) par2, for a total of 85M. A failure occurs, and 2M of data
  is lost. You use par2 to reconstruct the archive, and nothing is lost.
  (You can do this regardless of whether data, redundancy, or both are
  destroyed.) You are happy.
HTH,
Reid
Reply to: