[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: backup archive format saved to disk



On Thu, Dec 07, 2006 at 12:26:13PM -0500, hendrik@topoi.pooq.com wrote:
> On Thu, Dec 07, 2006 at 09:16:11AM -0500, Douglas Tutty wrote:
> > On Wed, Dec 06, 2006 at 09:02:37PM -0600, Reid Priedhorsky wrote:
> >  
> > > No, you _should_ compress it and then use some of the space you saved to
> > > add some carefully chosen redundancy which will allow you to reconstruct
> > > everything, not just some things, in case of failure. (E.g., using par2.)
> >  
> > > Scenario C: Compression plus redundancy
> > > 
> > >   Suppose you have 100 megabytes of files, uncompressed. You create a tar
> > >   archive and compress it down to 75M. You then create 10M of redundancy
> > >   using (e.g.) par2, for a total of 85M. A failure occurs, and 2M of data
> > >   is lost. You use par2 to reconstruct the archive, and nothing is lost.
> > >   (You can do this regardless of whether data, redundancy, or both are
> > >   destroyed.) You are happy.
> > > 
> > 
> > Hi Reid,
> > 
> > I've been looking at par2.  The question remains how to apply it to data
> > stored on media where the potential failure is one of media not
> > transmittion.  If I only protect the tar.bz2 file and a media failure
> > occurs, how could I have set up the par2 redundancy files to allow me to
> > recover the data.
> > 
> > Apparently, hard disks use FEC themselves so that they either can fix
> > the data or there is too much damage and the drive is inaccessible.  It
> > seems to be an all-or-nothing propositition.  If someone has experience
> > of FEC drive failures that refutes this I'd be very interested.
> > 
> > The only disk failures I have experienced are on older drives without
> > FEC that for a given sector return an error about bad CRC but one can
> > carry on and read the rest of the disk.  It was from this perspective
> > that I proposed the question that led to this thread.
> > 
> > If drives are atomic in this way, it seems that the only way to achieve
> > redundancy is through multiple copies (either manually done or via
> > raid1).
> > 
> > I'm still hoping that someone who knows how linux software raid work can
> > tell me how it decides that a drive has failed.  This question was posed
> > in a thread about raid1 internals.
> > 
> 
> I quite agree.  But in the absence of error-correction codes, 
> uncompressed is batter.
> 
> And if your error-correction software ahould happen to be unusable in several 
> years, your errors will not be easy to corrected.
> 
> Did you ever write any code in the 1970's that can't be run any more?
> I did.
> 

Thanks Hendrik,

I understand what you mean about compression and that seems to be the
consesus not just here but in general: without error correction, don't
compress so that recovery is possible; with error correction, compress
to save space.

I was 4 years old in 1970.  However, later (forget the year) I had a
Timex/Sinclair 1000.  The programs I wrote in Basic can and have
sometimes been ported to my Sharp PC-1401's Baisc, and later ported to
python on linux.  The facilitator as always is well documented
non-cryptic code.  However, the OS I made for the first computer I made
from scratch was written in Z-80 machine code in hex so isn't any good
for anything else.  Anything I write now is either in python or
fortran77.  I strive to keep everyting portable and minimize the use of
add-ons.  Pure fortran77 should always be able to be compiled on pretty
much anything.  If a python interpreter goes out of style, at least the
souce forms a great prototype for a new port.  I don't do C or perl and
I only program sh like a dos bat file (if flow-control is needed I
switch to python) which I suppose makes me an oddity on *NIX.

That's why I was looking for an existing archive format with built-in FEC.
Anything I cobble together would have to be backed up separatly so that
restoration would be possible.  I __really__ wish that FEC was a
standard option of tar, cpio, or afio being readily available.  Par2 is
available as a package but will it always be?  If I just archive its
executable, it may not work with whatever the libs-de-jure so I may try
to learn how to compile if from souce statically linked.  

All I really want is an archive media (and format) as robust as
pigment-on-parchment, that can store 80 Gb in about 300 cubic
centremeters (the data density of a 2.5" drive in a ruggedized
enclosure).  I guess this is the holy grail of data storage and is
both the bread and butter of the big specialized companies and the
reason that banks still print everything out somewhere.

I wonder what NASA did for their deep-space probes like Voyager?  The
recent stuff seems to be disposable (e.g. how long will this one last?),
but Voyager was meant to keep on running.  They used some sort of gold
pressed record for ETI to read but I wonder what they used for the
computer's OS and data-storage in-between downloads?

Thanks

Doug.





Reply to: