[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: deduplicating file systems: VDO with Debian?



hw wrote: 
> > As you say, deduplication in backup systems is quite common, and works
> > pretty well. There's also an on-disk non-filesystem utility, rdfind,
> > which is packaged in Debian. It can discover identical files and make
> > them hardlinks.
> 
> Well, if I had all the disk space to hold 2 full copies of the data to be able
> to deduplicate it only later, I wouldn't need to deduplicate anything.

Only two copies? That's not a good use case for any of the
deduplicators.

The point of rdfind is to use it in a cron job while some process is
generating duplicate files. For example, a backup process that copies a
filesystem every six hours will generate four identical copies of almost
every file each day. (rsnapshot would do a better job, here.)


> And how would pretending there are two backups while there's actually only one
> because it got deduplicated be better than having only one backup to begin with?
> (Yeah I haven't thought of that before ...)

It's not two backups, it's two very similar backups taken at
different times, so the majority of the files are the same but
some are different. If you want a second backup, it needs to go
on a different machine, preferably in a different location.

Maybe you should tell us what your actual use case is rather
than asking about realtime deduplication? It could be that
there's a completely different solution which would make you
happy.

-dsr-


Reply to: