[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: deduplicating file systems: VDO with Debian?



On Tue, 2022-11-08 at 10:26 +0100, didier gaumet wrote:
> Le 08/11/2022 à 04:49, hw a écrit :
> [...]
> > When I want to have 2 (or more) generations of backups, do I actually want
> > deduplication?  It leaves me with only one actual copy of the data which
> > seems
> > to defeat the idea of having multiple generations of backups at least to
> > some
> > extent.
> [...]
> 
> I would think there is also a confusion here (in my opinion, but I may 
> be wrong):
> 
> - deduplication is the action of preventing or correcting an object from 
> having multiples occurences. The criteria here is: are objects identical?
> 
> - incremental/differential backup(2) is the action of backuping only 
> objects (or deltas of objects) that have varied between backups. Thus 
> forbiding duplicates (on the target storage) of objects that have not 
> varied.
> But that definitely does not suppress duplicates on the source storage 
> (that you want to backup) nor prevent to backup these duplicates, thus 
> having duplicates on the target storage

When you keep N full generations of backups it's different.  Using rsync, you'll
only write the changes anyway, switching between the generations.  Most of the
data is being stored N times.

Now the question is if it makes sense to keep N full generations of backups when
you can use snapshots and/or deduplication to save space.  Since the data isn't
stored N times anymore, you save space but you have only one copy of most of the
data.

Do you actually need these N copies?  With backups on tapes you can switch
between, or with backups on multiple machines, it's easily an advantage to have
N copies.  But when you have it all on the same machine, is there an advantage
to having N copies?

One reason for having N copies would be to be able to go back in time.  But you
can do that with snapshots and that reason goes away.

Another reason is that the single copy may get damaged.  But when it's all on
the same machine anyway, does it matter?


> 
> (1) Wikipedia article on deduplication
> https://en.wikipedia.org/wiki/Data_deduplication
> (2) Wikipedia article on Backups, with incremental, differential, 
> dedduplications explanations:
> https://en.wikipedia.org/wiki/Backup#Deduplication
> 


Reply to: