On 9/5/23 17:39, Default User wrote:
On Tue, 2023-09-05 at 20:01 -0400, Default User wrote:
Now sudo du -sh / says that / seems to be using about 30 Gb. But sudo du -sh /media/user/rsnapshot_backups_of_host, says that the backup directory, /media/user/rsnapshot_backups_of_host on backup drive A, is using a whopping 88 Gb for 24 hourly, 7 daily, and 3 weekly!
That is better than (24+7+3) * 30 Gb = 1020 GB.88 GB - 30 GB = 58 GB of churn over 24 hours, 7 days, and/or 3 weeks may be reasonable for your workload. Are you doing multimedia content creation? Databases? Disk imaging? Anything else big?
I am thinking, that CAN'T be right. Maybe each hard link is being counted as a full, actual file, when adding up the space allegedly used. So, how can I determine how much space is really being used for the backups?
AIUI 'rsync --link-dest' hard links files on the destination only when both the file data and the file metadata are identical. If either changes, 'rsync --link-dest' considers the files to be different and does a transfer/ copy.
/var/log/* is a canonical degenerate example file-level deduplication. My Debian daily driver /var/log is 83 MB. 34 copies of that is 1.8 GB.
The challenge is finding big files with slightly different content, big files with identical content but different metadata, and/or large numbers of files with either or both differences.
I would start by using jdupes(1) to find identical backup files on the backup drive. Then use stat(1) or ls(1) on each group of files to find different metadata. You may want to put the commands into scripts as you figure them out.
To find files with mismatched content, I would use jdupes(1) with the --partial-only option, then jdupes(1), stat(1), and/or ls(1) to check data and metadata as above.
[BTW, the rsnapshot backups don't seem to take too much time, but doing rsync of external usb backup drive A to external usb backup drive B does take over 90 minutes each time. And that's once a day, every day! Most of that time is apparently not for data transfer, but for rsync building the indexes it needs each time.]
COW file systems such as ZFS provide a time vs. space caching trade-off.
Here is the command I use to rsync backup drive A (/media/default/MSD00001) to backup drive B (/media/default/MSD00002): time sudo rsync -aAXHxvv --delete-after --numeric-ids -- info=progress2,stats2,name2 -- exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/med ia /*","/lost+found"} /media/default/MSD00001/ /media/default/MSD00002/
I do not use the --numeric-ids option. I use matching username/UID and groupname/GID on all of my Debian and FreeBSD hosts. I want user/group name translation on my Windows/Cygwin and macOS hosts.
Your -v, -v, and --info options are going to generate a lot of output. I typically use the --progress and --stats options, and request more only when trouble-shooting.
I do not use the --exclude option. If and when a system crashes, I want everything; including files that an intruder may have placed in locations that are commonly not backed up. This means that when using the -x option, I must make sure to backup all file systems.
I just wanted to clarify: Each time backup drive A is rsync'd to backup drive B, much more than /media/user/MSD00001/rsnapshot_backups_of_host is being rsync'd. All of /media/user/MSD00001 is being rsync'd, which is somewhere around 900Gb. Maybe that is why each rsync takes over 90 minutes!
Please run the above rsync(1) command, without -v -v --info, and with --stats. Post your console session.
Use nmon(1) to watch the backup drives when doing the transfer. Tell us what you see.
David