[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: deduplicating file systems: VDO with Debian?



On Wed, 09 Nov 2022 13:28:46 +0100
hw <hw@adminart.net> wrote:

> On Tue, 2022-11-08 at 09:52 +0100, DdB wrote:
> > Am 08.11.2022 um 05:31 schrieb hw:  
> > > > That's only one point.  
> > > What are the others?
> > >   
> > > >  And it's not really some valid one, I think, as 
> > > > you do typically not run into space problems with one single
> > > > action (YMMV). Running multiple sessions and out-of-band
> > > > deduplication between them works for me.  
> > > That still requires you to have enough disk space for at least
> > > two full backups.
> > > I can see it working for three backups because you can
> > > deduplicate the first two, but not for two.  And why would I
> > > deduplicate when I have sufficient disk
> > > space.
> > >   
> > Your wording likely confuses 2 different concepts:  
> 
> Noooo, I'm not confusing that :)  Everyone says so and I don't know
> why ...
> 
> > Deduplication avoids storing identical data more than once.
> > whereas
> > Redundancy stores information on more than one place on purpose to
> > avoid loos of data in case of havoc.
> > ZFS can do both, as it combines the features of a volume manager
> > with those of a filesystem and a software RAID.( I am using
> > zfsonlinux since its early days, for over 10 years now, but without
> > dedup. )
> > 
> > In the past, i used shifting/rotating external backup media for that
> > purpose, because, as the saying goes: RAID is NOT a backup! Today, i
> > have a second server only for the backups, using zfs as well, which
> > allows for easy incremental backups, minimizing traffic and disk
> > usage.
> > 
> > but you should be clear as to what you want: redundancy or
> > deduplication?  
> 
> The question is rather if it makes sense to have two full backups on
> the same machine for redundancy and to be able to go back in time, or
> if it's better to give up on redundancy and to have only one copy and
> use snapshots or whatever to be able to go back in time.

And the answer is no. The redundancy you gain from this is almost,
though not quite, meaningless, because of the large set of common
data-loss scenarios against which it offers no protection. You've made
it clear that the cost of storage media is a problem in your situation.
Doubling your backup server's requirement for scarce and expensive disk
space in order to gain a tiny fraction of the resiliency that's
normally implied by "redundancy" doesn't make sense. And being able to
go "back in time" can be achieved much more efficiently by using a
solution (be it off-the-shelf or roll-your-own) that starts with a full
backup and then just stores deltas of changes over time (aka incremental
backups). None of this, for the record, is "deduplication", and I
haven't seen any indication in this thread so far that actual
deduplication is relevant to your use case.

> Of course it would better to have more than one machine, but I don't
> have that.

Fine, just be realistic about the fact that this means you cannot in
any meaningful sense have "two full backups" or "redundancy". If and
when you can some day devote an RPi tethered to some disks to the job,
then you can set it up to hold a second, completely independent,
store of "full backup plus deltas". And *then* you would have
meaningful redundancy that offers some real resilience. Even better if
the second one is physically offsite. 

In the meantime, storing multiple full copies of your data on one
backup server is just a way to rapidly run out of disk space on your
backup server for essentially no reason.


Cheers!
 -Chris


Reply to: