[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: backup archive format saved to disk



Douglas Tutty wrote:
On Tue, Dec 05, 2006 at 06:27:10PM -0600, Mike McCarty wrote:

Douglas Tutty wrote:


[snip]

One thing to bear in mind is that, no matter how good an FEC method
you use, you are going to have to store about 2x redundant data
to get anything out of it. IOW, the data + parity information is going
to be about 3x the size of the data alone for any reasonable ability
to correct anything.


Par2 seems to be able to do it at about 15%.  It comes down to number
theory and how many corrupted data blocks one needs to be able to
handle.  If 100 % of the data blocks are unavailable (worst case) then
you need 100% redundant data (i.e. raid1).

15% to do what?

I have designed some BCH FEC codes for a few systems, so I think I
have a reasonable feel for what is involved.

What you describe is correct only if each bit has only three values:
0, 1, and missing. If the transmission channel can only change a bit
from 0 to missing, or 1 to missing, but no other values, then 100%
redundancy is adequate for single error correction. If other types of
damage may occur, then 100% redundancy is not adequate. A distance 2
code is adequate if the only changes the channel introduces is missing
bits. But if not all bits may be distinguished as damaged, then at least
a distance 3 code is required, which needs more than 100% redundancy.

Separate files suggests that it be on a file system, and we're back to
where we started since I haven't found a parfs.

I don't understand this statement. If you have a means to create FEC
checksums, and a way to store those, and a way to use the FEC checksums
along with a damaged copy of the file to reconstruct it, then why
do you need some special kind of FS to store it?


My statement referrs to using par2 which doesn't touch the input file(s)
but generates the error-corecting data as separate files.

Wherever they get stored is irrelevant, except insofar as it may aid
the code in burst detection and correction.

What does FEC stand for?  I think ECC stands for Error Checking and
Correcting.

FEC = Forward Error Correction. When a transmission channel makes
error detection with request for retransmission infeasible (like
with space missions, or when the data are recorded, and no other
copy exists to use as a retransmission source, for examples) then
one uses some form of FEC. ECC = Error Correcting Code, which refers
to the code itself, not the technique. EDAC = Error Detection And
Correction, which refers to any number of techniques which may
include error detection with request for retransmit, or FEC,
for examples.

I suppose I could use par2 to create the ECC files, then feed the ECC
files one at a time, followed by the main data file, followed by the ECC
files again.

Why two copies of the FEC information?


What if two blocks on the drive fail, one containing data, the other
containing the ECC info?

Then the information in the check and data bits is used to correct them.

In a properly designed code the check bits are themselves part of the
correctable data, so that errors in them are correctable. The check bits
are not treated any diffferently from any other bits. They are all just
data. If the total number of bits which are damaged does not exceed
the ability of the code to correct, then they are all recovered.

I'll check out with my zip drive if I can write a tar file directly to
disk without a fs (unless someone knows the answer).

Why do you insist on not having a FS? Even if you don't have an FS,
I don't see why you want to separate the FEC information, unless you
don't have a program which can manage the information you're trying
to store. If that be the case, then the FEC information won't do
any good anyway.

I don't insist on not having a FS.  But how well does a FS work with bad
blocks cropped up?  If it doesn't encorporate ECC itself then it either
drops the data from the bad blocks or at worst can't be mounted.  The
question is, do I need a FS?  If I don't, isn't it just one more
potential point of failure?

How well does the disc work with bad blocks? If you have errors which
the disc itself cannot correct, then you are going to have to do
very low level recovery, indeed. That is why I suggested not doing that,
but rather use your own FEC, by having redundant copies of the entire
disc. In this wise, you don't have to go about trying to recover
whatever high level information may be on the disc, and fixing the
data storage format itself. If you do that, then it doesn't matter
whether there is a file system present, and if it is present whether
it can recover from corrupted blocks. You do the low level recovery,
then whatever data were on the discs it is recovered.

To put it another way, if the disc is unable to read its platters,
then it can't and you aren't going to get data, anyway, for those
sectors. It's better then, not to try to layer on top of something
that is going to lose large blocks by trying to do it on the
same device, but rather to rely on a separate device. To get data
for those sectors, you'd have to issue low level commands to the
controller, instructing it to do long-reads and ignore errors.

I've done that once with 360K floppies, and recovered data that
way, but I wouldn't want to go through learning how to do that
again for whatever hard disc I might have. For one thing, modern
discs do sector remapping, and I'd have to go through all that
rigamarole of finding out (if the manufacturer would even disclose)
how that takes place, and how to instruct the disc to ignore it,
and what the FEC code used on the sectors is, etc.

An added benefit of this is that it essentially creates a code which
has the ability to correct bursts as long as the whole disc, which is
quite long, indeed.

To put it another way, suppose we use CDROMs to store our information
in ISO format, and make three copies. Suppose that we get read errors
on the discs later. The easy way to handle this is to rip the ISO
images from the three discs, errors and all. Then we use the majority
rule to examine each bit. If two of the images agree that a bit is
a 1, then we put out a 1. If two of the images agree that a bit is
a 0, then we put out a 0. When we are through, then if there are
not any double errors, we have a correct ISO image which may be
mounted, or written to new media or whatever. It doesn't matter
whether the data form a file system, nor whether the file system
have any ability to correct errors. What matters is that, whatever
the format of the data on the storage medium, we can recover it.
We don't try to repair the file system, we repair the underlying bits.
Repairing the file system is too laborious.

If we can't recover an ISO image (or other file system image), then we
can't recover raw data written to disc, either. If we can recover a raw
disc write, then we can also recover an ISO image, or any other file
system format. It's all just bits recorded on a single long spiral on
the disc. To the bits themselves, it makes no difference if they got
there to make up a file system, or got there as a raw image. They are
all raw images, in the end. It's just a matter of a raw image of what.
A raw image of a file system is still just bits.

So, ISTM that whether any file system be used for storage is irrelevant,
so one might as well go ahead and use a file system for ease of mount
and read, unless the overhead of the directory entries is something
one wants to avoid. But in that case, one is going to lose permissions,
and dates, etc.

OTOH, if you want to save space, then a raw image of a compressed
archive file like a tarball will be smaller. But that is a separate
issue from data recovery.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!



Reply to: