Re: Limitations of rsnapshot-style backups (Was: Re: lazy old guy asks question)
On 9/2/25 11:12, Andy Smith wrote:
Hi,
Hello. :-)
On Tue, Sep 02, 2025 at 09:05:39AM -0400, Dan Ritter wrote:
David Christensen wrote:
a. Set the ZFS backup file system property "dedup". This will enable
block-level de-duplication, which can de-duplicate data more than hard links
alone.
This is generally not a good thing to recommend; one of the
authors of the system wrote a good article which should
definitely be read before turning on dedup:
https://despairlabs.com/blog/posts/2024-10-27-openzfs-dedup-is-good-dont-use-it/
I evaluated zfs dedup with my real data as part of deciding what to do,
and it became clear I would need to significantly increase the hardware
I was dedicating to the task and I would also need to re-think the
remote places I am storing further copies (for even more expense).
ZFS will use as little or as much hardware as you give it. My SOHO
servers are cobbled together from used 10+ year old entry-level server
parts, and I am very pleased with their price and performance. Going
down, some readers use Raspberry Pi's. Going up, I seem to recall one
user with a mid-range dual-CPU rack server with an external drive bay
and 24 disks (?).
I have not ventured into remote ZFS, yet.
Having said that, there were other factors in my decision. The
cross-source dedup is not a huge factor. That is, dedup done on all data
across all hosts being backed up isn't amazing. My figures show that 1.6
TB in rsnapshot came out as 920 GB in restic. If we assume that turning
off zfs dedup loses the dedup between backuo sources, but the
snapshotting continues to allow only diffs within the backups for each
source to be stored, then it's less than double the capacity needed. And
again, as I said, capacity isn't so difficult.
So yeah in summary, probably don't even consider zfs dedup but
do consider zfs.
Thanks,
Andy
I am not sure that I understand your backup storage benchmarks. But, I
can share mine FWIW.
Compression and de-duplication are enabled on the backup file system:
2025-09-02 19:56:47 toor@f5 ~
# zfs get compression,dedup p5/backup
NAME PROPERTY VALUE SOURCE
p5/backup compression on local
p5/backup dedup verify local
Here are the statistics for ZFS backups since June 2022 of my Debian
daily driver OS disk:
2025-09-02 17:45:16 toor@f5 ~
# zfs get -H -p refer,used p5/backup/laalaa.tracy.holgerdanske.com
p5/backup/laalaa.tracy.holgerdanske.com referenced 6222024704 -
p5/backup/laalaa.tracy.holgerdanske.com used 89901633536 -
2025-09-02 17:45:45 toor@f5 ~
# zfs get -H -p -r -o value -t snapshot referenced
p5/backup/laalaa.tracy.holgerdanske.com | perl -e '$n=0; $sum=0; while
(<STDIN>) {$n++; $sum += $_}; print "snapshots $n referenced_total $sum\n"'
snapshots 140 referenced_total 824925708288
2025-09-02 17:46:59 toor@f5 ~
# perl -e 'print 89901633536-6222024704, $/'
83679608832
2025-09-02 17:50:12 toor@f5 ~
# perl -e 'print 824925708288/83679608832, $/'
9.85814489099929
So:
* The most recent backup is 6,222,024,704 bytes.
* There are 140 snapshots of previous backups.
* The snapshots have an apparent size of 824,925,708,288 bytes.
* The snapshots use 83,679,608,832 bytes of storage.
* The snapshot compression plus de-duplication ratio is 9.85814489099929
to 1.
Here are the statistics for ZFS backups since June 2022 of various Linux
OS disks, FreeBSD OS disks, Windows home directories, macOS home
directories, and factory-fresh USB flash drives:
2025-09-02 17:57:21 toor@f5 ~
# zfs get -r -H -p -t filesystem -o value referenced p5/backup | perl -e
'$n=0; $sum=0; while (<STDIN>) {$n++; $sum += $_}; print "filesystems $n
referenced_total $sum\n"'
filesystems 11 referenced_total 34437898240
2025-09-02 17:57:33 toor@f5 ~
# zfs get -r -H -p -t filesystem -o value used p5/backup | perl -e
'$n=0; $sum=0; while (<STDIN>) {$n++; $sum += $_}; print "filesystems $n
used_total $sum\n"'
filesystems 11 used_total 423907360768
2025-09-02 17:58:57 toor@f5 ~
# zfs get -H -p -r -o value -t snapshot referenced p5/backup | perl -e
'$n=0; $sum=0; while (<STDIN>) {$n++; $sum += $_}; print "snapshots $n
referenced_total $sum\n"'
snapshots 1083 referenced_total 3392397172736
2025-09-02 17:59:22 toor@f5 ~
# perl -e 'print 423907360768-34437898240, $/'
389469462528
2025-09-02 18:15:09 toor@f5 ~
# perl -e 'print 3392397172736/389469462528, $/'
8.71030337196748
So:
* The most recent backups are 34,437,898,240 bytes.
* There are 1,083 snapshots of previous backups
* The snapshots have an apparent size of 3,392,397,172,736 bytes.
* The snapshots use 389,469,462,528 bytes of storage.
* The snapshot compression plus de-duplication ratio is 8.71030337196748
to 1.
David
Reply to: