[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: backup archive format saved to disk



Douglas Tutty wrote:
On Tue, Dec 05, 2006 at 05:47:23PM -0600, Mike McCarty wrote:

Johannes Wiedersich wrote:

Douglas Tutty wrote:


I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.

[snip]


Now, to something completely different....
If data integrity is your concern, than maybe a better solution than
compression is to copy all your data with rsync or another backup tool
that 'mirrors' your files instead of packing them all together in one
large file. If something goes wrong with this large file you might loose
the backup of all your files. If something goes wrong with the

[snip]

My understanding of the BZ2 format is that it compresses individual
blocks independently, and that the loss of a block will not compromize
the entire archive, only those files which are contained in a given
block.


Yes. But I don't want to loose any data at all.

Of course not. I was responding to Johannes' statement that one
risks entire loss. This is true with, for example, gzip of a tar,
but not with bzip2.

I've looked at par2.  It looks interesting.  For me, the question is how
to implement it for archiving onto a drive since the ECC data are
separate files rather than being included within one data stream.

You could implement your own FEC. A very simple form of FEC is simply
three copies, which you can do by hand. Another possibility is simply
have two copies of the BZ2 and read any bad blocks from the other
copy. This corresponds more closely to the request retransmission
model than FEC, but is reasonable in this circumstance.

One thing to bear in mind is that, no matter how good an FEC method
you use, you are going to have to store about 2x redundant data
to get anything out of it. IOW, the data + parity information is going
to be about 3x the size of the data alone for any reasonable ability
to correct anything.

Separate files suggests that it be on a file system, and we're back to
where we started since I haven't found a parfs.

I don't understand this statement. If you have a means to create FEC
checksums, and a way to store those, and a way to use the FEC checksums
along with a damaged copy of the file to reconstruct it, then why
do you need some special kind of FS to store it?

I suppose I could use par2 to create the ECC files, then feed the ECC
files one at a time, followed by the main data file, followed by the ECC
files again.

Why two copies of the FEC information?

I'll check out with my zip drive if I can write a tar file directly to
disk without a fs (unless someone knows the answer).

Why do you insist on not having a FS? Even if you don't have an FS,
I don't see why you want to separate the FEC information, unless you
don't have a program which can manage the information you're trying
to store. If that be the case, then the FEC information won't do
any good anyway.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!



Reply to: