On 9/1/25 14:57, Karl Vogel wrote:
On Mon 01 Sep 2025 at 16:15:39 (-0400), David Christensen wrote:a. Set the ZFS backup file system property "dedup". This will enable block-level de-duplication, which can de-duplicate data more than hard links alone.This option eats RAM like candy, so make sure you have plenty.
From what I have seen on FreeBSD ZFS, under load ZFS can consume as much memory as it needs. For storage servers, this is exactly what I want -- I paid for that memory, I want ZFS to use it. But, I have little experience with ZFS on workstations; where many processes are competing for memory. AIUI there are tunables for ZFS, so you have options.
b. Set the ZFS backup file system property "compression".If you have large backup files, you can save more space by using "gzip" for compression. On my backup box, this is for highly-compressible data like large (1-3Gb) text-formatted logs: Method Best Compression Ratio ------------------------------- gzip 8.07x lz4 5.83x "gzip" takes slightly longer to store a big file, but I don't notice any real delays when reading it. And I'm not patient.
I agree that it is possible to choose an optimum compression algorithm for specific data, but that implies grouping the data according to compression algorithm.
I already have a few top-level ZFS file systems that could benefit from this optimization -- archives, backup, cvs, images, ghost, samba, and virtualbox. I will definitely consider it (and some other ideas) the next time I rebuild.
3. zfs-diff(8) -- for example, to determine the backed up directories and files whose metadata and/or data have changed between two snapshots:https://bezoar.org/src/zfs-snapshots/ describes using this for faster incremental backups, even on spinning rust.
If I am understanding the article correctly, the author wrote a script to ZFS diff a ZFS file system against its last snapshot and to copy the changed files to another filesystem (?). I can see how this could be useful if the author uses zfs-auto-snapshot(8) to take daily snapshots and he wants to save modified files more frequently on demand, but I think I would write a script that runs zfs-auto-snapshot(8) on demand and encodes the current date-time in the snapshot name. But, the author's approach makes it easy to see what changed, while my approach would require another script to list only those files that changed. TIMTOWTDI.
David