Re: Backup solutions without reinventig the wheel these days

To: debian-user@lists.debian.org
Subject: Re: Backup solutions without reinventig the wheel these days
From: Mario Castelán Castro <marioxcc.MT@yandex.com>
Date: Tue, 20 Oct 2015 12:39:59 -0500
Message-id: <[🔎] 56267C6F.3060306@yandex.com>
In-reply-to: <[🔎] CAOyjJOLRCKr8v+LEJ7w1eUPJfRVnRG-T9tBxfOL5UYdOjZ4YWA@mail.gmail.com>
References: <[🔎] CAOyjJOLRCKr8v+LEJ7w1eUPJfRVnRG-T9tBxfOL5UYdOjZ4YWA@mail.gmail.com>

Take a look at the "--link-dest" option of rsync; it providesdeduplication at the file level. That may work well for storingsnapshots more efficiently in your use case given that your dataconsists of many very small files. Since it uses hardlinks, it can'tdeduplicate among files that have the same content but differentfilesystem level metadata for those attributes that rsync is instructedto keep (for a backup those usually include permissions, user, group,modification date).

You can create one directory for each snapshot within a directory forbackups, then each time you add a new snapshot, pass the previoussnapshot to "--link-dest".

For example, if you store your backups under directory "/backups", andthe previous backup is under directory "/backups/2015-10-13", then tomake a new backup for today, use "rsync [OTHER-OPTIONS]--link-dest=/backups/2015-10-13 [SOURCE] /backups/2015-10-20". You mayuse "--link-dest" several times with different directories.

To be able to detect corruption of the backup (as opposed to corruptionof the live data), compute hashes of the files. You can use "find-print0 -type f | xargs -0 sha256sum > [HASHES-FILE]". Take a hash ofthis list of hasehes and store it in at least 2 places, so that you willbe able to detect and distinguish corruption of the list of hashes fromcorruption of the snapshots.


Regards.

El 20/10/15 a las 11:57, Ondřej Grover escribió:

Hello,

I'm looking for recommendations for backup solutions that don't reinvent
the wheel and are reliable and used. I want to backup two servers to a
backup server. The main data content is several hundred GB in many very
small files.

I really like the idea behind backupninja, because it provides a
centralized solution to the cron + ssh transfer (rsync) + mail paradigm and
elevates the need to write one's own elaborate scripts. It also provides
the most common backup helper scripts with sensible defaults. The mail
reporting part isn't that great (does not offer consistent logging of data
transfer solutions), but that can be fixed with a few custom shell scripts.

However, I found that for my use-case rdiff-backup runs out of memory on
the backup server (1GB RAM + 1GB swap) and duplicity creates an over 50 GB
signature file. I could use just simple rsync, but incremental +
compression would be a nice feature as data corruption may not become
apparent immediately.

I've also looked at the new kids on the block like obnam, attick and
borgbackup. They look interesting, but I prefer time-tested SW for backups.
After realizing that these new backup programs pretty much try to replicate
features of btrfs or ZFS (incremental snapshots, block-level compression
and deduplication) I started thinking that I could perhaps just send the
data to the backup server via rsync and save them to a btrfs or ZFS (but
the backup server may not have enough RAM for ZFS) and create daily
snaphosts on the server. If memory will permit (if I optimize it), I'd go
with ZFS as it should be more reliable. Does anybody use such a solution?

I also had a look at Bacula, but it seemed that it does not offer
block-level deduplication and compression at the moment.

I'm looking forward to your recommendations.

Kind regards,
Ondřej Grover

Reply to:

Follow-Ups:
- Re: Backup solutions without reinventig the wheel these days
  - From: Teemu Likonen <tlikonen@iki.fi>

References:
- Backup solutions without reinventig the wheel these days
  - From: Ondřej Grover <ondrej.grover@gmail.com>

Prev by Date: Backup solutions without reinventig the wheel these days
Next by Date: Re: Backup solutions without reinventig the wheel these days
Previous by thread: Backup solutions without reinventig the wheel these days
Next by thread: Re: Backup solutions without reinventig the wheel these days
Index(es):
- Date
- Thread