Re: deduplicating file systems: VDO with Debian?

To: debian-user@lists.debian.org
Subject: Re: deduplicating file systems: VDO with Debian?
From: David Christensen <dpchrist@holgerdanske.com>
Date: Mon, 7 Nov 2022 21:46:39 -0800
Message-id: <[🔎] 4f3b5f4a-37a7-c5f0-1eec-5bba2497960a@holgerdanske.com>
In-reply-to: <[🔎] 13894cc6cb833ed3fc005f80e77e1c937cb42d86.camel@adminart.net>
References: <[🔎] 45998a8cdc61d0945fc5907bc8b30b697b3f5703.camel@adminart.net> <[🔎] CAKkunMYo=BUfLPkzme2YvsP6vz4dXd4+BU=k25D=nSgD0Fs78w@mail.gmail.com> <[🔎] 2204fb91aba42873cb7513b2514e4ffa975f5b6f.camel@adminart.net> <[🔎] tkamrf$17q8$1@ciao.gmane.io> <[🔎] 13894cc6cb833ed3fc005f80e77e1c937cb42d86.camel@adminart.net>

On 11/7/22 19:49, hw wrote:

On Mon, 2022-11-07 at 11:32 +0100, didier gaumet wrote:

At (linux) filesystem level, I think in-line deduplication is only
provided by ZFS (and perhaps, out-of-tree, BTRFS)


That's what it seems like, except VDO.  Unfortunately, ZFS is said to need 5--
6GB of RAM for each 1TB of data, and that would require upgrading my server.

On my ZFS storage and backup servers, ZFS seems to grab the majority ofavailable memory. I have been unable to figure out a way to measurememory consumed by deduplication.

When I want to have 2 (or more) generations of backups, do I actually want
deduplication?  It leaves me with only one actual copy of the data which seems
to defeat the idea of having multiple generations of backups at least to some
extent.

The question is then if it makes a difference.  It also creates the question if
I need (want) multiple generations of backups, especially when I end up with
only one copy anyway.  Hmm ...

I put rsync based backups on ZFS storage with compression andde-duplication. du(1) reports 33 GiB for the current backups (e.g.uncompressed and/or duplicated size). zfs-auto-snapshot takes snapshotsof the backup filesystems daily and monthly, and I take snapshotsmanually every week. I have 78 snapshots going back ~6 months. du(1)reports ~3.5 TiB for the snapshots. 'zfs list' reports 86.2 GiB ofactual disk usage for all 79 backups. So, ZFS de-duplication andcompression leverage my backup storage by 41:1.



ZFS compression and de-duplication also works well for jails/ VM's.


For general data, I use compression alone.

For compressed and/or encrypted archives, image, etc., I do not usecompression or de-duplication



The key is to only use de-duplication when there is a lot of duplication.

And, to a lesser extend, to only use compression on uncompressed data(lz4 detects compressed data and does not try to compress it further).

My ZFS pools are built with HDD's. I recently added an SSD-based vdevas a dedicated 'dedup' device, and write performance improvedsignificantly when receiving replication streams.



David

Reply to:

Follow-Ups:
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>

References:
- deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: Anders Andersson <pipatron@gmail.com>
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: didier gaumet <didier.gaumet@gmail.com>
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>

Prev by Date: Re: deduplicating file systems: VDO with Debian?
Next by Date: Re: support for ancient peripherals
Previous by thread: Re: deduplicating file systems: VDO with Debian?
Next by thread: Re: deduplicating file systems: VDO with Debian?
Index(es):
- Date
- Thread