[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: deduplicating file systems: VDO with Debian?



Curt wrote: 
> On 2022-11-08, The Wanderer <wanderer@fastmail.fm> wrote:
> >
> > That more general sense of "backup" as in "something that you can fall
> > back on" is no less legitimate than the technical sense given above, and
> > it always rubs me the wrong way to see the unconditional "RAID is not a
> > backup" trotted out blindly as if that technical sense were the only one
> > that could possibly be considered applicable, and without any
> > acknowledgment of the limited sense of "backup" which is being used in
> > that statement.
> >
> 
> Maybe it's a question of intent more than anything else. I thought RAID
> was intended for a server scenario where if a disk fails, you're down
> time is virtually null, whereas as a backup is intended to prevent data
> loss. RAID isn't ideal for the latter because it doesn't ship the saved
> data off-site from the original data (or maybe a RAID array is
> conceivable over a network and a distance?).

RAID means "redundant array of inexpensive disks". The idea, in the
name, is to bring together a bunch of cheap disks to mimic a single more
expensive disk, in a way which hopefully is more resilient to failure.

If you need a filesystem that is larger than a single disk (that you can
afford, or that exists), RAID is the name for the general approach to
solving that.

The three basic technologies of RAID are:

striping: increase capacity by writing parts of a data stream to N
disks. Can increase performance in some situations.

mirroring: increase resiliency by redundantly writing the same data to
multiple disks. Can increase performance of reads.

checksums/erasure coding: increase resilency by writing data calculated
from the real data (but not a full copy) that allows reconstruction of
the real data from a subset of disks. RAID5 allows one failure, RAID6
allows recovery from two simultaneous failures, fancier schemes may
allow even more.

You can work these together, or separately.

Now, RAID is not a backup because it is a single store of data: if you
delete something from it, it is deleted. If you suffer a lightning
strike to the server, there's no recovery from molten metal.

Some filesystems have snapshotting. Snapshotting can protect you from
the accidental deletion scenario, by allowing you to recover quickly,
but does not protect you from lightning.

The lightning scenario requires a copy of the data in some other
location. That's a backup.

You can store the backup on a RAID. You might need to store the backup
on a RAID, or perhaps by breaking it up into pieces to store on tapes or
optical disks or individual hard disks. The kind of RAID you choose for
the backup is not related to the kind of RAID you use on your primary
storage.

> Of course, I wouldn't know one way or another, but the complexity (and
> substantial verbosity) of this thread seem to indicate that that all
> these concepts cannot be expressed clearly and succinctly, from which I
> draw my own conclusions.

The fact that many people talk about things that they don't understand
does not restrict the existence of people who do understand it. Only
people who understand what they are talking about can do so clearly and
succinctly.

-dsr-


Reply to: