[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: deduplicating file systems: VDO with Debian?



On Mon, 2022-11-07 at 21:46 -0800, David Christensen wrote:
> On 11/7/22 19:49, hw wrote:
> > On Mon, 2022-11-07 at 11:32 +0100, didier gaumet wrote:
> 
> > > At (linux) filesystem level, I think in-line deduplication is only
> > > provided by ZFS (and perhaps, out-of-tree, BTRFS)
> > 
> > That's what it seems like, except VDO.  Unfortunately, ZFS is said to need
> > 5--
> > 6GB of RAM for each 1TB of data, and that would require upgrading my server.
> 
> 
> On my ZFS storage and backup servers, ZFS seems to grab the majority of 
> available memory.  I have been unable to figure out a way to measure 
> memory consumed by deduplication.

Are you deduplicating?  Apparently some people say bad things happen when ZFS
runs out of memory from deduplication.

> > The question is then if it makes a difference.  It also creates the question
> > if
> > I need (want) multiple generations of backups, especially when I end up with
> > only one copy anyway.  Hmm ...
> 
> 
> I put rsync based backups on ZFS storage with compression and 
> de-duplication.  du(1) reports 33 GiB for the current backups (e.g. 
> uncompressed and/or duplicated size).  zfs-auto-snapshot takes snapshots 
> of the backup filesystems daily and monthly, and I take snapshots 
> manually every week.  I have 78 snapshots going back ~6 months.  du(1) 
> reports ~3.5 TiB for the snapshots.  'zfs list' reports 86.2 GiB of 
> actual disk usage for all 79 backups.  So, ZFS de-duplication and 
> compression leverage my backup storage by 41:1.

I'm unclear as to how snapshots come in when it comes to making backups.  What
if you have a bunch of snapshots and want to get a file from 6 generations of
backups ago?  I never figured out how to get something out of an old snapshot
and found it all confusing, so I don't even use them.

33GB in backups is far from a terrabyte.  I have a lot more than that.

> ZFS compression and de-duplication also works well for jails/ VM's.
> 
> 
> For general data, I use compression alone.
> 
> 
> For compressed and/or encrypted archives, image, etc., I do not use 
> compression or de-duplication

Yeah, they wouldn't compress.  Why no deduplication?

> The key is to only use de-duplication when there is a lot of duplication.

How do you know if there's much to deduplicate before deduplicating?

> And, to a lesser extend, to only use compression on uncompressed data 
> (lz4 detects compressed data and does not try to compress it further).
> 
> 
> My ZFS pools are built with HDD's.  I recently added an SSD-based vdev 
> as a dedicated 'dedup' device, and write performance improved 
> significantly when receiving replication streams.

Hm, with the ZFS I set up a couple years ago, the SSDs wore out and removing
them without any replacement didn't decrease performance.

I'm not too fond of ZFS, especially not when considering performance.  But for
backups, it won't matter.


Reply to: