[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Limitations of rsnapshot-style backups (Was: Re: lazy old guy asks question)



On 9/1/25 14:57, Karl Vogel wrote:
On Mon 01 Sep 2025 at 16:15:39 (-0400), David Christensen wrote:
a.  Set the ZFS backup file system property "dedup".  This will enable
block-level de-duplication, which can de-duplicate data more than hard
links alone.

   This option eats RAM like candy, so make sure you have plenty.


From what I have seen on FreeBSD ZFS, under load ZFS can consume as much memory as it needs. For storage servers, this is exactly what I want -- I paid for that memory, I want ZFS to use it. But, I have little experience with ZFS on workstations; where many processes are competing for memory. AIUI there are tunables for ZFS, so you have options.


b.  Set the ZFS backup file system property "compression".

   If you have large backup files, you can save more space by using "gzip"
   for compression.  On my backup box, this is for highly-compressible data
   like large (1-3Gb) text-formatted logs:

     Method   Best Compression Ratio
     -------------------------------
     gzip     8.07x
     lz4      5.83x

   "gzip" takes slightly longer to store a big file, but I don't notice
   any real delays when reading it.  And I'm not patient.


I agree that it is possible to choose an optimum compression algorithm for specific data, but that implies grouping the data according to compression algorithm.


I already have a few top-level ZFS file systems that could benefit from this optimization -- archives, backup, cvs, images, ghost, samba, and virtualbox. I will definitely consider it (and some other ideas) the next time I rebuild.


3.  zfs-diff(8) -- for example, to determine the backed up directories
and files whose metadata and/or data have changed between two snapshots:

   https://bezoar.org/src/zfs-snapshots/ describes using this for faster
   incremental backups, even on spinning rust.


If I am understanding the article correctly, the author wrote a script to ZFS diff a ZFS file system against its last snapshot and to copy the changed files to another filesystem (?). I can see how this could be useful if the author uses zfs-auto-snapshot(8) to take daily snapshots and he wants to save modified files more frequently on demand, but I think I would write a script that runs zfs-auto-snapshot(8) on demand and encodes the current date-time in the snapshot name. But, the author's approach makes it easy to see what changed, while my approach would require another script to list only those files that changed. TIMTOWTDI.


David


Reply to: