Limitations of rsnapshot-style backups (Was: Re: lazy old guy asks question)

To: debian-user@lists.debian.org
Subject: Limitations of rsnapshot-style backups (Was: Re: lazy old guy asks question)
From: Andy Smith <andy@strugglers.net>
Date: Sun, 31 Aug 2025 23:05:50 +0000
Message-id: <[🔎] aLTVTjyht6Hlkji9@mail.bitfolk.com>
In-reply-to: <[🔎] slrn10b8ffl.1ir.curtyshoo@einstein.home.arpa>
References: <f3665688c8ef783c99e846f8ec8c9d65.ref@gmail.com> <[🔎] 0dc9b8a155527824b2b26933db76de1d@gmail.com> <[🔎] 20250829111619.GF28172@wooledge.org> <[🔎] 202508291415.29485.roy@rtellason.com> <[🔎] 20250829182548.GH28172@wooledge.org> <[🔎] aLI0Mf47OvoMlLUM@mail.bitfolk.com> <[🔎] slrn10b8ffl.1ir.curtyshoo@einstein.home.arpa>

Hi,

On Sun, Aug 31, 2025 at 12:20:05PM -0000, Greg wrote:
> On 2025-08-29, Andy Smith <andy@strugglers.net> wrote:
> >
> > I have more than 20 years of experience using rsnpashot. For non-trivial
> > amounts of files I would not recommend rsnapshot or any other
> > rsync-based backup system in 2025. "Non-trivial" is still a pretty large
> > amount though and rsnapshot does have the extremely desirable feature of
> > being very very simple, so could still be a solid choice.
> 
> Can we know why not (rsnapshot)

I've written about it at some length before, here:

    https://lists.debian.org/msgid-search/ZwWunVcpkuUEvnpC@mail.bitfolk.com

I'm actually in the middle of transitioning all my backups that are in
rsnapshot to something else, mainly for performance reasons, and will
write up a blog post about that in the next few days.

It's mostly the inherent limitations of using hardlinks for backups, and
a distant secondary is the weak deduplication.

1. Every version of every file is one extra hardlink

This is the killer for non-trivial use of rsync-based backup methods.
Traversing a directory tree of millions of inodes is expensive.

2. You can only deduplicate on whole files at exactly matching paths
   with identical metadata

This is a lesser problem. It's just storage capacity, right? And that's
fairly cheap.

> and your definition of non-trivial?

It's going to depend on the number of files backed up and the class of
the hardware that you're willing to throw at it. I want to stress that
rsnapshot is great until it isn't. If you don't have backups it's a
really good way to start and it may well suffice for a person's uses
forever. It's simple and simple is good.

As part of the way rsnapshot and similar rsync-plus-hardlinks schemes
work, they have to scan the entire file tree of the previous backup in
order to either transfer a new copy of each file (if it changed) or do a
hardlink to the previous copy (if it remained the same). So firstly when
you start to notice that each run spends an unacceptable amount of time
doing this, it's a sign that your backups have gotten too big for this
design.

It's a personal decision how long you are willing to have a backup run
take, but you can quickly reach a point where the time spent scanning
way exceeds the time spent transferring any data, and at some point
other methods of backup will be much quicker than rsnapshot-style.

The other pain point is for any kind of management operation you want to
do on larger parts of the backup tree, like work out where all the space
went (you backed up things you didn't mean to back up). Then things get
really slow as it's the same problem of traversing a large file tree but
multiplied by however many levels of it that you need to consider.

For example, think about how you will determine the actual disk space
used by the backups from the host foo.example.com.

$ sudo time du -sh daily.0/foo.example.com
22G     daily.0/foo.example.com
5.81user 41.96system 2:10.06elapsed 36%CPU (0avgtext+0avgdata 51320maxresident)k
5254528inputs+0outputs (56major+57005minor)pagefaults 0swaps

Great but how much changed between that day and the one before it?

$ sudo time du -sh daily.1/foo.example.com
22G     daily.1/foo.example.com
5.40user 39.05system 2:05.36elapsed 35%CPU (0avgtext+0avgdata 51384maxresident)k
5250480inputs+0outputs (43major+57020minor)pagefaults 0swaps

So they're the same and nothing changed? No.

$ sudo time du -sh daily.{0,1}/foo.example.com
22G     daily.0/foo.example.com
513M    daily.1/foo.example.com
11.00user 80.76system 4:12.87elapsed 36%CPU (0avgtext+0avgdata 51368maxresident)k
10521304inputs+0outputs (1254major+66533minor)pagefaults 0swaps

So there's 513M of changed data between these two subdirectories. Note
how:

- It took several minutes to get answers to these simple questions even
  though these file trees are on SSDs, not HDDs
- That time scaled mostly linearly; there apparently wasn't much caching
- We still don't know exactly which files changed

We can at least tell if any pair of files (e.g.
daily.0/foo.example.come/home/andy/.bash_profile vs
daily.1/foo.example.come/home/andy/.bash_profile) are identical without
having to examine their whole content because if one is hard linked to
the other then the inode numbers will be the same. But even so, telling
*what* changed does require a full stat of two directory trees.

Other backup systems can make it easier and faster to get answers to
these sorts of questions without waiting minutes or hours, because they
store more metadata about what they did. That is at the expense of them
being more complicated.

Lastly talking about the poor mans' deduplication using hardlinks. Like
I say you can just throw hardware at this since capacity is fairly cheap
to scale compared to random access time of files. Some hard figures:

The rsnapshot system I'm retiring appears to have 1.6T of data in it.
This is on a btrfs filesystem with zstd:1 compression, and `compsize`
says that the uncompressed size is 2.16T (a lot of things aren't very
compressible). However, if hard links were considered as full copies of
each other then it references over 14T of data, so the hard linking is
doing quite a good job.

I've painstakingly imported all this into a restic backup repository,
using one restic snapshot for each individual rsnapshot backup, i.e.
there's one restic snapshot for daily.0/foo.example.com and another for
daily.1/foo.example.com and so on for every host backed up at every
interval of rsnapshot. What is in restic now is exactly what is in
rsnapshot. In restic it takes up 920G, not 1.6T.

1.60T vs 0.92T wasn't really the issue for me, but I had run out of
storage capacity so something had to be done for backups to continue.
Since I was going to rebuild it I chose different tactics but the real
reason was the difficulty in managing a tree of hundreds of millions of
mostly hardlinks. I didn't want to rebuild and end up with something
that still had those problems.

Thanks,
Andy

Reply to:

References:
- Re: lazy old guy asks question
  - From: "mick.crane" <mick.crane@gmail.com>
- Re: lazy old guy asks question
  - From: Greg Wooledge <greg@wooledge.org>
- Re: lazy old guy asks question
  - From: "Roy J. Tellason, Sr." <roy@rtellason.com>
- Re: lazy old guy asks question
  - From: Greg Wooledge <greg@wooledge.org>
- Re: lazy old guy asks question
  - From: Andy Smith <andy@strugglers.net>
- Re: lazy old guy asks question
  - From: Greg <curtyshoo@gmail.com>

Prev by Date: Firefox URL-bar & autocomplete in Plasma (Wayland), broken since Trixie upgrade
Next by Date: Re: linux user groups?
Previous by thread: Re: lazy old guy asks question
Next by thread: Re: lazy old guy asks question
Index(es):
- Date
- Thread