[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: I uninstalled OpenMediaVault (because totally overkill for me) and replaced it with borgbackup and rsyncq



On 9/5/23 17:39, Default User wrote:
On Tue, 2023-09-05 at 20:01 -0400, Default User wrote:

Now sudo du -sh / says that / seems to be using about 30 Gb. But sudo
du -sh /media/user/rsnapshot_backups_of_host, says that the backup
directory, /media/user/rsnapshot_backups_of_host on backup drive A,
is
using a whopping 88 Gb for 24 hourly, 7 daily, and 3 weekly!


That is better than (24+7+3) * 30 Gb = 1020 GB.


88 GB - 30 GB = 58 GB of churn over 24 hours, 7 days, and/or 3 weeks may be reasonable for your workload. Are you doing multimedia content creation? Databases? Disk imaging? Anything else big?


I am thinking, that CAN'T be right.
Maybe each hard link is being counted as a full, actual file, when
adding up the space allegedly used.

So, how can I determine how much space is really being used for the
backups?


AIUI 'rsync --link-dest' hard links files on the destination only when both the file data and the file metadata are identical. If either changes, 'rsync --link-dest' considers the files to be different and does a transfer/ copy.


/var/log/* is a canonical degenerate example file-level deduplication. My Debian daily driver /var/log is 83 MB. 34 copies of that is 1.8 GB.


The challenge is finding big files with slightly different content, big files with identical content but different metadata, and/or large numbers of files with either or both differences.


I would start by using jdupes(1) to find identical backup files on the backup drive. Then use stat(1) or ls(1) on each group of files to find different metadata. You may want to put the commands into scripts as you figure them out.


To find files with mismatched content, I would use jdupes(1) with the --partial-only option, then jdupes(1), stat(1), and/or ls(1) to check data and metadata as above.


[BTW, the rsnapshot backups don't seem to take too much time, but
doing
rsync of external usb backup drive A to external usb backup drive B
does take over 90 minutes each time. And that's once a day, every
day!
Most of that time is apparently not for data transfer, but for rsync
building the indexes it needs each time.]


COW file systems such as ZFS provide a time vs. space caching trade-off.


Here is the command I use to rsync backup drive A
(/media/default/MSD00001) to backup drive B
(/media/default/MSD00002):

time sudo rsync -aAXHxvv --delete-after --numeric-ids --
info=progress2,stats2,name2 --
exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/med
ia
/*","/lost+found"} /media/default/MSD00001/ /media/default/MSD00002/


I do not use the --numeric-ids option. I use matching username/UID and groupname/GID on all of my Debian and FreeBSD hosts. I want user/group name translation on my Windows/Cygwin and macOS hosts.


Your -v, -v, and --info options are going to generate a lot of output. I typically use the --progress and --stats options, and request more only when trouble-shooting.


I do not use the --exclude option. If and when a system crashes, I want everything; including files that an intruder may have placed in locations that are commonly not backed up. This means that when using the -x option, I must make sure to backup all file systems.


I just wanted to clarify:

Each time backup drive A is rsync'd to backup drive B, much more than
/media/user/MSD00001/rsnapshot_backups_of_host is being rsync'd.  All
of /media/user/MSD00001 is being rsync'd, which is somewhere around
900Gb. Maybe that is why each rsync takes over 90 minutes!


Please run the above rsync(1) command, without -v -v --info, and with --stats. Post your console session.


Use nmon(1) to watch the backup drives when doing the transfer. Tell us what you see.


David


Reply to: