Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

To: debian-user@lists.debian.org
Subject: Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)
From: didier gaumet <didier.gaumet@gmail.com>
Date: Tue, 8 Nov 2022 10:04:01 +0100
Message-id: <[🔎] tkd621$nea$1@ciao.gmane.io>
In-reply-to: <[🔎] df5a10a325537cae9c199b618d5179f7b3d00ac1.camel@adminart.net>
References: <[🔎] 45998a8cdc61d0945fc5907bc8b30b697b3f5703.camel@adminart.net> <[🔎] tkamrf$17q8$1@ciao.gmane.io> <[🔎] 20221107143007.vtqfhf325gcizc76@randomstring.org> <[🔎] 202211071357.59223.rhkramer@gmail.com> <[🔎] df5a10a325537cae9c199b618d5179f7b3d00ac1.camel@adminart.net>

Le 08/11/2022 à 05:13, hw a écrit :

On Mon, 2022-11-07 at 13:57 -0500, rhkramer@gmail.com wrote:



I didn't (and don't) know much about deduplication (beyond what you might
deduce from the name), so I google and found this article which was helpful to
me:

    * [[https://www.linkedin.com/pulse/lets-know-vdo-virtual-data-optimizer-
ganesh-gaikwad][Lets know about VDO (virtual data optimizer)]]


That's a good pointer, but I still wonder how VDO actually works.  For example,
if I have a volume with 5TB of data on it and I write a 500kB file to that
volume a week later or whenever, and the file I'm writing is identical to
another file somewhere within the 5TB of data alreading on the volume, how does
VDO figure out that both files are identical?  ZFS does it by keeping lots of
data in memory so it can look it up right away, but VDO?  Will it write the new
file at first and check it later in the background and re-use the space later,
or will it delay the write to check it first?  Or does it do something else?


There are elements of answer in RedHat doc:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/vdo-integration
and blog, that exposes performance trade-off:
https://www.redhat.com/en/blog/look-vdo-new-linux-compression-layer

from what I understand, VDO was designed as a layer in kernel space toprovide deduplication and compression features to local or distributedfilesystems that lack it. The goal being primarily to optimize storagespace for a provider of networked virtual machines to entities or customers

Reply to:

Follow-Ups:
- Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)
  - From: hw <hw@adminart.net>

References:
- deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: didier gaumet <didier.gaumet@gmail.com>
- Re: deduplicating file systems: VDO with Debian?
  - From: Dan Ritter <dsr@randomstring.org>
- definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)
  - From: rhkramer@gmail.com
- Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)
  - From: hw <hw@adminart.net>

Prev by Date: Re: deduplicating file systems: VDO with Debian?
Next by Date: Re: deduplicating file systems: VDO with Debian?
Previous by thread: Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)
Next by thread: Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)
Index(es):
- Date
- Thread