[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Backup systems



On 9/5/23 07:34, Michael Kjörling wrote:
On 4 Sep 2023 13:57 -0700, from dpchrist@holgerdanske.com (David Christensen):
* I am using zfs-auto-snapshot(8) for snapsnots.  Are you using rsnapshot(1)
for snapshots?

No. I'm using ZFS snapshots on the source, but not for backup
purposes. (I have contemplated doing that, but it would increase
complexity a fair bit.) The backup target is not snapshotted at the > block storage or file system level; however, rsync --link-dest uses
hardlinks to deduplicate whole files.


+1 for complexity of ZFS backups via snapshots and replication.


My question was incongruous, as "snapshot" has different meanings for ZFS and rsnapshot(1):

*   https://docs.oracle.com/cd/E18752_01/html/819-5461/ftyue.html

    snapshot

        A read-only copy of a file system or volume at a given point in
        time.

*   https://rsnapshot.org/rsnapshot/docs/docbook/rest.html

    Using rsnapshot, it is possible to take snapshots of your
    filesystems at different points in time.


As I understand your network topology and backup strategy, it appears that you are using rsnapshot(1) for snapshots (in the rsnapshot(1) sense of the term).


* du(1) of the backup file system matches ZFS properties 'referenced' and
'usedbydataset'.

This would be expected, depending on exact specifics (what data du
traverses over and what your ZFS dataset layout is). To more closely
match the the _apparent_ size of the files, you'd look at e.g.
logicalreferenced or logicalused.

* I am unable to correlate du(1) of the snapshots to any ZFS properties --
du(1) reports much more storage than ZFS 'usedbysnapshots', even when scaled
by 'compressratio'.

This would also be expected, as ZFS snapshots are copy-on-write and
thus in effect only bookkeep a delta, whereas du counts the apparent
size of all files accessible under a path and ZFS snapshots allow
access to all files within the file system as they appeared at the
moment the snapshot was created. There are nuances and caveats
involved but, as a first approximation, immediately after taking a ZFS
snapshot the size of the snapshot is zero (plus a small amount of
metadata overhead for the snapshot itself) regardless of the size of
the underlying dataset, and the apparent size of the snapshot grows as
changes are made to the underlying dataset which cause some data to be
referenced only by the snapshot.

In general, ZFS disk space usage accounting for snapshots is really
rather non-intuitive, but it does make more sense when you consider
that ZFS is a copy-on-write file system and that snapshots largely
boil down to an atomic point-in-time marker for dataset state.


Okay. My server contains one backup ZFS file system for each host on my network. So, the 'logicalreferenced', 'logicalused', and 'usedbysnapshots' properties I posted for one host's backup file system are affected by the ZFS pool aggregate COW, compression, and/or deduplcation features.


(In ZFS, a dataset can be either a file system optionally exposed at a
directory mountpoint or a volume exposed as a block device.)


I try to use ZFS vocabulary per the current Oracle WWW documentation (but have found discrepancies). I wonder if ZFS-on-Linux and/or OpenZFS have diverged (e.g. 'man zfs' on Debian, etc.):

    https://docs.oracle.com/cd/E18752_01/html/819-5461/ftyue.html

    "A generic name for the following ZFS components: clones, file
    systems, snapshots, and volumes."


David


Reply to: