[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Backup solutions without reinventig the wheel these days



Take a look at the "--link-dest" option of rsync; it provides deduplication at the file level. That may work well for storing snapshots more efficiently in your use case given that your data consists of many very small files. Since it uses hardlinks, it can't deduplicate among files that have the same content but different filesystem level metadata for those attributes that rsync is instructed to keep (for a backup those usually include permissions, user, group, modification date).

You can create one directory for each snapshot within a directory for backups, then each time you add a new snapshot, pass the previous snapshot to "--link-dest".

For example, if you store your backups under directory "/backups", and the previous backup is under directory "/backups/2015-10-13", then to make a new backup for today, use "rsync [OTHER-OPTIONS] --link-dest=/backups/2015-10-13 [SOURCE] /backups/2015-10-20". You may use "--link-dest" several times with different directories.

To be able to detect corruption of the backup (as opposed to corruption of the live data), compute hashes of the files. You can use "find -print0 -type f | xargs -0 sha256sum > [HASHES-FILE]". Take a hash of this list of hasehes and store it in at least 2 places, so that you will be able to detect and distinguish corruption of the list of hashes from corruption of the snapshots.

Regards.

El 20/10/15 a las 11:57, Ondřej Grover escribió:
Hello,

I'm looking for recommendations for backup solutions that don't reinvent
the wheel and are reliable and used. I want to backup two servers to a
backup server. The main data content is several hundred GB in many very
small files.

I really like the idea behind backupninja, because it provides a
centralized solution to the cron + ssh transfer (rsync) + mail paradigm and
elevates the need to write one's own elaborate scripts. It also provides
the most common backup helper scripts with sensible defaults. The mail
reporting part isn't that great (does not offer consistent logging of data
transfer solutions), but that can be fixed with a few custom shell scripts.

However, I found that for my use-case rdiff-backup runs out of memory on
the backup server (1GB RAM + 1GB swap) and duplicity creates an over 50 GB
signature file. I could use just simple rsync, but incremental +
compression would be a nice feature as data corruption may not become
apparent immediately.

I've also looked at the new kids on the block like obnam, attick and
borgbackup. They look interesting, but I prefer time-tested SW for backups.
After realizing that these new backup programs pretty much try to replicate
features of btrfs or ZFS (incremental snapshots, block-level compression
and deduplication) I started thinking that I could perhaps just send the
data to the backup server via rsync and save them to a btrfs or ZFS (but
the backup server may not have enough RAM for ZFS) and create daily
snaphosts on the server. If memory will permit (if I optimize it), I'd go
with ZFS as it should be more reliable. Does anybody use such a solution?

I also had a look at Bacula, but it seemed that it does not offer
block-level deduplication and compression at the moment.

I'm looking forward to your recommendations.

Kind regards,
Ondřej Grover



Reply to: