[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Limitations of rsnapshot-style backups (Was: Re: lazy old guy asks question)



On 8/31/25 16:05, Andy Smith wrote:
On Sun, Aug 31, 2025 at 12:20:05PM -0000, Greg wrote:
On 2025-08-29, Andy Smith <andy@strugglers.net> wrote:
For non-trivial
amounts of files I would not recommend rsnapshot or any other
rsync-based backup system in 2025.

Can we know why not (rsnapshot)

I've written about it at some length before, here:

     https://lists.debian.org/msgid-search/ZwWunVcpkuUEvnpC@mail.bitfolk.com

I'm actually in the middle of transitioning all my backups that are in
rsnapshot to something else, mainly for performance reasons, and will
write up a blog post about that in the next few days.

It's mostly the inherent limitations of using hardlinks for backups, and
a distant secondary is the weak deduplication.

1. Every version of every file is one extra hardlink

This is the killer for non-trivial use of rsync-based backup methods.
Traversing a directory tree of millions of inodes is expensive.

2. You can only deduplicate on whole files at exactly matching paths
    with identical metadata

This is a lesser problem. It's just storage capacity, right? And that's
fairly cheap.

and your definition of non-trivial?

It's going to depend on the number of files backed up and the class of
the hardware that you're willing to throw at it. I want to stress that
rsnapshot is great until it isn't. If you don't have backups it's a
really good way to start and it may well suffice for a person's uses
forever. It's simple and simple is good.

As part of the way rsnapshot and similar rsync-plus-hardlinks schemes
work, they have to scan the entire file tree of the previous backup in
order to either transfer a new copy of each file (if it changed) or do a
hardlink to the previous copy (if it remained the same). So firstly when
you start to notice that each run spends an unacceptable amount of time
doing this, it's a sign that your backups have gotten too big for this
design.

It's a personal decision how long you are willing to have a backup run
take, but you can quickly reach a point where the time spent scanning
way exceeds the time spent transferring any data, and at some point
other methods of backup will be much quicker than rsnapshot-style.

The other pain point is for any kind of management operation you want to
do on larger parts of the backup tree, like work out where all the space
went (you backed up things you didn't mean to back up). Then things get
really slow as it's the same problem of traversing a large file tree but
multiplied by however many levels of it that you need to consider.

For example, think about how you will determine the actual disk space
used by the backups from the host foo.example.com.

$ sudo time du -sh daily.0/foo.example.com
22G     daily.0/foo.example.com
5.81user 41.96system 2:10.06elapsed 36%CPU (0avgtext+0avgdata 51320maxresident)k
5254528inputs+0outputs (56major+57005minor)pagefaults 0swaps

Great but how much changed between that day and the one before it?

$ sudo time du -sh daily.1/foo.example.com
22G     daily.1/foo.example.com
5.40user 39.05system 2:05.36elapsed 35%CPU (0avgtext+0avgdata 51384maxresident)k
5250480inputs+0outputs (43major+57020minor)pagefaults 0swaps

So they're the same and nothing changed? No.

$ sudo time du -sh daily.{0,1}/foo.example.com
22G     daily.0/foo.example.com
513M    daily.1/foo.example.com
11.00user 80.76system 4:12.87elapsed 36%CPU (0avgtext+0avgdata 51368maxresident)k
10521304inputs+0outputs (1254major+66533minor)pagefaults 0swaps

So there's 513M of changed data between these two subdirectories. Note
how:

- It took several minutes to get answers to these simple questions even
   though these file trees are on SSDs, not HDDs
- That time scaled mostly linearly; there apparently wasn't much caching
- We still don't know exactly which files changed

We can at least tell if any pair of files (e.g.
daily.0/foo.example.come/home/andy/.bash_profile vs
daily.1/foo.example.come/home/andy/.bash_profile) are identical without
having to examine their whole content because if one is hard linked to
the other then the inode numbers will be the same. But even so, telling
*what* changed does require a full stat of two directory trees.

Other backup systems can make it easier and faster to get answers to
these sorts of questions without waiting minutes or hours, because they
store more metadata about what they did. That is at the expense of them
being more complicated.

Lastly talking about the poor mans' deduplication using hardlinks. Like
I say you can just throw hardware at this since capacity is fairly cheap
to scale compared to random access time of files. Some hard figures:

The rsnapshot system I'm retiring appears to have 1.6T of data in it.
This is on a btrfs filesystem with zstd:1 compression, and `compsize`
says that the uncompressed size is 2.16T (a lot of things aren't very
compressible). However, if hard links were considered as full copies of
each other then it references over 14T of data, so the hard linking is
doing quite a good job.

I've painstakingly imported all this into a restic backup repository,
using one restic snapshot for each individual rsnapshot backup, i.e.
there's one restic snapshot for daily.0/foo.example.com and another for
daily.1/foo.example.com and so on for every host backed up at every
interval of rsnapshot. What is in restic now is exactly what is in
rsnapshot. In restic it takes up 920G, not 1.6T.

1.60T vs 0.92T wasn't really the issue for me, but I had run out of
storage capacity so something had to be done for backups to continue.
Since I was going to rebuild it I chose different tactics but the real
reason was the difficulty in managing a tree of hundreds of millions of
mostly hardlinks. I didn't want to rebuild and end up with something
that still had those problems.

Thanks,
Andy


Thank you for explaining the scaling problems of backup solutions such as rsnapshot(1) [1] that use a traditional file system for backup storage and hard links for de-duplication.


I would add:

1. The killer feature of the rsync-plus-hardlinks approach is that backups can be accessed by any program or script that can access the file system -- ls(1), find(1), cat(1), grep(1), diff(1), sh(1), awk(1), sed(1), perl(1), etc.. This allows the user and the system administrator to use familiar tools, and provides maximum flexibility.

2. A disadvantage of the rsync-plus-hardlinks approach is that backups can be accessed by any program or script that can access the file system. Beyond security concerns, an incorrect program or script, or incorrect usage of such, can damage backups (whenever the backup file system is mounted read-write).


Regarding backup solutions such as restic(1) [2] that use a command-line interface (CLI) and a blob for backup storage:

1. The killer feature of the cli-plus-blob approach is that the backups are encapsulated and access is controlled by the CLI. This can improve security and prevent damage.

2. A disadvantage of the cli-plus-blob approach is that the backups are encapsulated and access is controlled by the CLI. The user and system administrator are at the mercy of the CLI author for access to backups. The backup solution author is faced with re-implementing userland equivalent commands in the CLI and/or providing an API for program/ script access.


I have found that using a ZFS file system [3] for backup storage provides a better mix of advantages and disadvantages:

1. For encryption, build the ZFS pool on top of encrypted providers -- e.g. partitions with LUKS, etc.. (The last time I checked, OpenZFS native encryption has open issues that I find unsuitable for production.)

2. Create one ZFS backup file system for each live file system to be backed up:

a. Set the ZFS backup file system property "dedup". This will enable block-level de-duplication, which can de-duplicate data more than hard links alone.

b. Set the ZFS backup file system property "compression". For suitable data, this can save even more storage space.

3. Use whatever means (rsync(1), etc.) to copy data from the live file system to the backup file system. Exclude copying data that should not be backed up. Delete data in the backup file system that is no longer in the live file system.

4.  Take a ZFS snapshot of the backup file system.

5. Snapshots are mounted hidden and read-only at ".zfs/snapshot/*" under the backup file system mountpoint.


Regarding the backup administration queries you mention, ZFS provides commands and properties that run much faster than crawling the backup file system and/or snapshots:

1. zfs-list(8) -- for example, to list the backup snapshots for my Debian daily driver OS disk:

2025-09-01 11:53:22 toor@f5 ~
# zfs list -r -t snapshot p5/backup/laalaa.tracy.holgerdanske.com
NAME USED AVAIL REFER MOUNTPOINT p5/backup/laalaa.tracy.holgerdanske.com@zfs-auto-snap_m-2022-06-01-03h21 821M - 4.89G - p5/backup/laalaa.tracy.holgerdanske.com@zfs-auto-snap_m-2022-07-01-03h21 745M - 5.22G - p5/backup/laalaa.tracy.holgerdanske.com@zfs-auto-snap_m-2022-08-01-03h21 407M - 4.34G -
<snip 136 lines>

2. zfs-get(8) -- for example, to determine the on-disk size of the backup file system and snapshots for my Debian daily driver OS disk:

2025-09-01 12:42:37 toor@f5 ~
# zfs get name,refer,usedbysnapshots p5/backup/laalaa.tracy.holgerdanske.com
NAME PROPERTY VALUE SOURCE p5/backup/laalaa.tracy.holgerdanske.com name p5/backup/laalaa.tracy.holgerdanske.com - p5/backup/laalaa.tracy.holgerdanske.com referenced 5.79G - p5/backup/laalaa.tracy.holgerdanske.com usedbysnapshots 77.1G -

3. zfs-diff(8) -- for example, to determine the backed up directories and files whose metadata and/or data have changed between two snapshots:

2025-09-01 12:45:24 toor@f5 ~
# zfs diff p5/backup/laalaa.tracy.holgerdanske.com@zfs-auto-snap_d-2025-08-31-03h09 p5/backup/laalaa.tracy.holgerdanske.com@zfs-auto-snap_d-2025-09-01-03h09
M	/var/local/backup/laalaa.tracy.holgerdanske.com/dev
M	/var/local/backup/laalaa.tracy.holgerdanske.com/etc
M	/var/local/backup/laalaa.tracy.holgerdanske.com/proc
<snip 889 lines>


David


[1] https://manpages.debian.org/trixie/rsnapshot/rsnapshot.1.en.html

[2] https://manpages.debian.org/trixie/restic/restic.1.en.html

[3] https://openzfs.github.io/openzfs-docs/man/index.html


Reply to: