[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: What tool can I use to make efficient incremental backups?



On Monday 21 August 2017 23:43:09 Celejar wrote:

> On Sun, 20 Aug 2017 02:05:46 -0400
>
> Gene Heskett <gheskett@shentel.net> wrote:
> > On Saturday 19 August 2017 23:07:01 Celejar wrote:
> > > On Thu, 17 Aug 2017 11:47:34 -0500
> > >
> > > Mario Castelán Castro <marioxcc.MT@yandex.com> wrote:
> > > > Hello.
> > > >
> > > > Currently I use rsync to make the backups of my personal data,
> > > > including some manually selected important files of system
> > > > configuration. I keep old backups to be more safe from the
> > > > scenario where I have deleted something important, I make a
> > > > backup, and I only notice the deletion afterwards.
> > > >
> > > > Each backup snapshot is stored in its own directory. There is
> > > > much redundancy between subsequent backups. I use the option
> > > > "--link-dest" to make hard links and thus save space for files
> > > > that are *identical* to an already-existing file in the backup
> > > > repository. but this is still inefficient. Any change to a file,
> > > > even to its metadata (permission, modification time, etc.), will
> > > > result in the file being saved at whole, instead of a delta.
> > > >
> > > > Can you suggest a more efficient alternative?
> > >
> > > There's Borg, which apparently has good deduplication. I've just
> > > started using it, but it's a very sophisticated and quite popular
> > > piece of software, judging by chatter in various internet threads.
> > >
> > > https://borgbackup.readthedocs.io/en/stable/
> > >
> > > Celejar
> >
> > Amanda has quite intelligent ways to do that. I run it nightly and
> > have
>
> [Snipped lots of miscellaneous, but seemingly irrelevant, discussion
> about Amanda's virtues.]
>
> Amanda does deduplication? Link?
>
> Celejar

Amanda does not do this "deduplication" that I am aware of.

That is another aspect of data control that does not belong in the job 
discription of what a backup program should do, which is to be a 
repository on some other storage medium besides the day to day operating 
cache, of the data you will need to recover and restore normal 
operations should your main drive become unusable with no signs of ill 
health until its falls over.

The backup program should be a relatively simple, so dependable its 
boring, yet smart enough to adjust its internal schedule of backup 
levels so as to use as much of the storage media as it needs on a long 
time continuous use scenario. Amanda carries this to extremes but you 
may have to help it occasionally if a given entry in the disklist grows 
until a level 0 back no longer fits on the amount of media you allow it 
to use per run.  As I've added machines to the list as they've been 
added to my home network over the last 20 years, I find myself needing 
to either buy a bigger drive, or further breakup my home directory into 
smaller pieces to reduce the total size of that one entry.  But amanda 
will never throw you under the bus, it a full won't fit, it continues to 
do level 1's or even level 2's.

A level 1 is anything that has changed since the last level 0, a level 2 
is anything changed since the previous level 1, etc etc.

Amanda is an administrator program, useing, at the PFC level of the 
duty's, usually tar and gzip but can use other compressors, for the 
actual data moving.  It keeps records over the span time you set it up 
to use, so It knows where everything it has backed up is.  But because 
those records are stored on the daily use drive, they aren't of much 
utility if you need to do a bare metal recovery, so I wrote a wrapper 
that adds this database and a copy of the configuration that made the 
back to the end of every backup it makes, so I can. and have actually 
done a bare metal recovery to a new drive in around 8 hours. The only 
thing I lost was about 75 emails that had come in since the nightly 
backup a few hours before that failure.

Backups are so much a personal preferences thing its hard to define.

Some folks who are used to doing a full backup on friday night that may 
take 50 terrabytes worth of tapes and a tape library that costs $50,000 
that they simply cannot wrap their mind around a program that does a 
level 0 of a given disklist entry on any arbitrary nightly run. And 
keeps track of a system such as the NY State Health system, doing it on 
a tape a night.

They can't get that amanda keeps records, and if you need to recover the 
home directories of Joe and Jane Sixpack who work in sales, amanda will 
look up the last level 0, restore that, and restore over that from the 
various other level 1 or 2 backups made since until it arrives at and 
recovers anything of theirs in last nights backup. I am backing up 5 
machines here, using 20 to 32 GB worth of space a night on a separate 1 
TB drive thats currently about 78% full.

You can make up your own mind, but to me amanda has been a good thing.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>


Reply to: