[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Limitations of rsnapshot-style backups (Was: Re: lazy old guy asks question)



On 9/2/25 11:12, Andy Smith wrote:
Hi,


Hello.  :-)


On Tue, Sep 02, 2025 at 09:05:39AM -0400, Dan Ritter wrote:
David Christensen wrote:
a.  Set the ZFS backup file system property "dedup".  This will enable
block-level de-duplication, which can de-duplicate data more than hard links
alone.

This is generally not a good thing to recommend; one of the
authors of the system wrote a good article which should
definitely be read before turning on dedup:
https://despairlabs.com/blog/posts/2024-10-27-openzfs-dedup-is-good-dont-use-it/

I evaluated zfs dedup with my real data as part of deciding what to do,
and it became clear I would need to significantly increase the hardware
I was dedicating to the task and I would also need to re-think the
remote places I am storing further copies (for even more expense).


ZFS will use as little or as much hardware as you give it. My SOHO servers are cobbled together from used 10+ year old entry-level server parts, and I am very pleased with their price and performance. Going down, some readers use Raspberry Pi's. Going up, I seem to recall one user with a mid-range dual-CPU rack server with an external drive bay and 24 disks (?).


I have not ventured into remote ZFS, yet.


Having said that, there were other factors in my decision. The
cross-source dedup is not a huge factor. That is, dedup done on all data
across all hosts being backed up isn't amazing. My figures show that 1.6
TB in rsnapshot came out as 920 GB in restic. If we assume that turning
off zfs dedup loses the dedup between backuo sources, but the
snapshotting continues to allow only diffs within the backups for each
source to be stored, then it's less than double the capacity needed. And
again, as I said, capacity isn't so difficult.

So yeah in summary, probably don't even consider zfs dedup but
do consider zfs.

Thanks,
Andy


I am not sure that I understand your backup storage benchmarks. But, I can share mine FWIW.


Compression and de-duplication are enabled on the backup file system:

2025-09-02 19:56:47 toor@f5 ~
# zfs get compression,dedup p5/backup
NAME       PROPERTY     VALUE           SOURCE
p5/backup  compression  on              local
p5/backup  dedup        verify          local


Here are the statistics for ZFS backups since June 2022 of my Debian daily driver OS disk:

2025-09-02 17:45:16 toor@f5 ~
# zfs get -H -p refer,used p5/backup/laalaa.tracy.holgerdanske.com
p5/backup/laalaa.tracy.holgerdanske.com	referenced	6222024704	-
p5/backup/laalaa.tracy.holgerdanske.com	used	89901633536	-

2025-09-02 17:45:45 toor@f5 ~
# zfs get -H -p -r -o value -t snapshot referenced p5/backup/laalaa.tracy.holgerdanske.com | perl -e '$n=0; $sum=0; while (<STDIN>) {$n++; $sum += $_}; print "snapshots $n referenced_total $sum\n"'
snapshots 140 referenced_total 824925708288

2025-09-02 17:46:59 toor@f5 ~
# perl -e 'print 89901633536-6222024704, $/'
83679608832

2025-09-02 17:50:12 toor@f5 ~
# perl -e 'print 824925708288/83679608832, $/'
9.85814489099929


So:

* The most recent backup is 6,222,024,704 bytes.

* There are 140 snapshots of previous backups.

* The snapshots have an apparent size of 824,925,708,288 bytes.

* The snapshots use 83,679,608,832 bytes of storage.

* The snapshot compression plus de-duplication ratio is 9.85814489099929 to 1.


Here are the statistics for ZFS backups since June 2022 of various Linux OS disks, FreeBSD OS disks, Windows home directories, macOS home directories, and factory-fresh USB flash drives:

2025-09-02 17:57:21 toor@f5 ~
# zfs get -r -H -p -t filesystem -o value referenced p5/backup | perl -e '$n=0; $sum=0; while (<STDIN>) {$n++; $sum += $_}; print "filesystems $n referenced_total $sum\n"'
filesystems 11 referenced_total 34437898240

2025-09-02 17:57:33 toor@f5 ~
# zfs get -r -H -p -t filesystem -o value used p5/backup | perl -e '$n=0; $sum=0; while (<STDIN>) {$n++; $sum += $_}; print "filesystems $n used_total $sum\n"'
filesystems 11 used_total 423907360768

2025-09-02 17:58:57 toor@f5 ~
# zfs get -H -p -r -o value -t snapshot referenced p5/backup | perl -e '$n=0; $sum=0; while (<STDIN>) {$n++; $sum += $_}; print "snapshots $n referenced_total $sum\n"'
snapshots 1083 referenced_total 3392397172736

2025-09-02 17:59:22 toor@f5 ~
# perl -e 'print 423907360768-34437898240, $/'
389469462528

2025-09-02 18:15:09 toor@f5 ~
# perl -e 'print 3392397172736/389469462528, $/'
8.71030337196748


So:

* The most recent backups are 34,437,898,240 bytes.

* There are 1,083 snapshots of previous backups

* The snapshots have an apparent size of 3,392,397,172,736 bytes.

* The snapshots use 389,469,462,528 bytes of storage.

* The snapshot compression plus de-duplication ratio is 8.71030337196748 to 1.


David


Reply to: