[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)



Le 08/11/2022 à 05:13, hw a écrit :
On Mon, 2022-11-07 at 13:57 -0500, rhkramer@gmail.com wrote:


I didn't (and don't) know much about deduplication (beyond what you might
deduce from the name), so I google and found this article which was helpful to
me:

    * [[https://www.linkedin.com/pulse/lets-know-vdo-virtual-data-optimizer-
ganesh-gaikwad][Lets know about VDO (virtual data optimizer)]]

That's a good pointer, but I still wonder how VDO actually works.  For example,
if I have a volume with 5TB of data on it and I write a 500kB file to that
volume a week later or whenever, and the file I'm writing is identical to
another file somewhere within the 5TB of data alreading on the volume, how does
VDO figure out that both files are identical?  ZFS does it by keeping lots of
data in memory so it can look it up right away, but VDO?  Will it write the new
file at first and check it later in the background and re-use the space later,
or will it delay the write to check it first?  Or does it do something else?

There are elements of answer in RedHat doc:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/vdo-integration
and blog, that exposes performance trade-off:
https://www.redhat.com/en/blog/look-vdo-new-linux-compression-layer

from what I understand, VDO was designed as a layer in kernel space to provide deduplication and compression features to local or distributed filesystems that lack it. The goal being primarily to optimize storage space for a provider of networked virtual machines to entities or customers


Reply to: