[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: deduplicating file systems: VDO with Debian?



On 11/7/22 19:49, hw wrote:
On Mon, 2022-11-07 at 11:32 +0100, didier gaumet wrote:

At (linux) filesystem level, I think in-line deduplication is only
provided by ZFS (and perhaps, out-of-tree, BTRFS)

That's what it seems like, except VDO.  Unfortunately, ZFS is said to need 5--
6GB of RAM for each 1TB of data, and that would require upgrading my server.


On my ZFS storage and backup servers, ZFS seems to grab the majority of available memory. I have been unable to figure out a way to measure memory consumed by deduplication.


When I want to have 2 (or more) generations of backups, do I actually want
deduplication?  It leaves me with only one actual copy of the data which seems
to defeat the idea of having multiple generations of backups at least to some
extent.

The question is then if it makes a difference.  It also creates the question if
I need (want) multiple generations of backups, especially when I end up with
only one copy anyway.  Hmm ...


I put rsync based backups on ZFS storage with compression and de-duplication. du(1) reports 33 GiB for the current backups (e.g. uncompressed and/or duplicated size). zfs-auto-snapshot takes snapshots of the backup filesystems daily and monthly, and I take snapshots manually every week. I have 78 snapshots going back ~6 months. du(1) reports ~3.5 TiB for the snapshots. 'zfs list' reports 86.2 GiB of actual disk usage for all 79 backups. So, ZFS de-duplication and compression leverage my backup storage by 41:1.


ZFS compression and de-duplication also works well for jails/ VM's.


For general data, I use compression alone.


For compressed and/or encrypted archives, image, etc., I do not use compression or de-duplication


The key is to only use de-duplication when there is a lot of duplication.


And, to a lesser extend, to only use compression on uncompressed data (lz4 detects compressed data and does not try to compress it further).


My ZFS pools are built with HDD's. I recently added an SSD-based vdev as a dedicated 'dedup' device, and write performance improved significantly when receiving replication streams.


David


Reply to: