Re: deduplicating file systems: VDO with Debian?

To: hw <hw@adminart.net>
Cc: debian-user@lists.debian.org
Subject: Re: deduplicating file systems: VDO with Debian?
From: Dan Ritter <dsr@randomstring.org>
Date: Tue, 8 Nov 2022 07:19:25 -0500
Message-id: <[🔎] 20221108121925.ug3e4amspm4a5trl@randomstring.org>
In-reply-to: <[🔎] 62fcf32f959389850bf6409be792b285b1738fd7.camel@adminart.net>
References: <[🔎] 45998a8cdc61d0945fc5907bc8b30b697b3f5703.camel@adminart.net> <[🔎] CAKkunMYo=BUfLPkzme2YvsP6vz4dXd4+BU=k25D=nSgD0Fs78w@mail.gmail.com> <[🔎] 2204fb91aba42873cb7513b2514e4ffa975f5b6f.camel@adminart.net> <[🔎] tkamrf$17q8$1@ciao.gmane.io> <[🔎] 20221107143007.vtqfhf325gcizc76@randomstring.org> <[🔎] 62fcf32f959389850bf6409be792b285b1738fd7.camel@adminart.net>

hw wrote: 
> > As you say, deduplication in backup systems is quite common, and works
> > pretty well. There's also an on-disk non-filesystem utility, rdfind,
> > which is packaged in Debian. It can discover identical files and make
> > them hardlinks.
> 
> Well, if I had all the disk space to hold 2 full copies of the data to be able
> to deduplicate it only later, I wouldn't need to deduplicate anything.

Only two copies? That's not a good use case for any of the
deduplicators.

The point of rdfind is to use it in a cron job while some process is
generating duplicate files. For example, a backup process that copies a
filesystem every six hours will generate four identical copies of almost
every file each day. (rsnapshot would do a better job, here.)

> And how would pretending there are two backups while there's actually only one
> because it got deduplicated be better than having only one backup to begin with?
> (Yeah I haven't thought of that before ...)

It's not two backups, it's two very similar backups taken at
different times, so the majority of the files are the same but
some are different. If you want a second backup, it needs to go
on a different machine, preferably in a different location.

Maybe you should tell us what your actual use case is rather
than asking about realtime deduplication? It could be that
there's a completely different solution which would make you
happy.

-dsr-

Reply to:

Follow-Ups:
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>

References:
- deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: Anders Andersson <pipatron@gmail.com>
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: didier gaumet <didier.gaumet@gmail.com>
- Re: deduplicating file systems: VDO with Debian?
  - From: Dan Ritter <dsr@randomstring.org>
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>

Prev by Date: Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)
Next by Date: Re: support for ancient peripherals
Previous by thread: Re: deduplicating file systems: VDO with Debian?
Next by thread: Re: deduplicating file systems: VDO with Debian?
Index(es):
- Date
- Thread