[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Home made backup system



On 2019-12-18 09:02, rhkramer@gmail.com wrote:
Aside / Admission: I don't backup all that I should and as often as I should,
so I'm looking for ways to improve.  One thought I have is to write my own
backup "system" and use it, and I've thought about that a little, and provide
some of my thoughts below.

A purpose of sending this to the mailing-list is to find out if there already
exists a solution (or parts of a solution) close to what I'm thinking about
(no sense re-inventing the wheel), or if someone thinks I've overlooked
something or making a big mistake.

Part of the reason for doing my own is that I don't want to be trapped into
using a system that might disappear or change and leave me with a problem.  (I
subscribe to a mailing list for one particular backup system, and I wrote to
that list with my concerns and a little bit of my thoughts about my own system
(well, at the time, I was hoping for a "universal" configuration file (the file
that would specify what, where, when, how each file, directory, or partition to
be backed up would be treated), one that could be read and acted upon by a
great variety (and maybe all future backup programs).

The only response I got (iirc) was that since their program was open source,
it would never go away.  (Yet, if I'm not mixing up backup programs, they were
transitioning from using Python 2 as the underlying language to Python 3 --
I'm not sure Python 2 would ever go completely away, or become non-functional,
but it reinforces my belief / fear that any (complex?) backup program, even
open source, would someday become unusable.

So, here are my thoughts:

After I thought about (hoped for) a universal config file for backup programs
and it seeming that no such thing exists (not surprising), I thought I'd try
to create my own -- this morning as I thought about it a little more (despite
a headache and a non-working car what I should be working on), I thought that
the simplest thing for me to do is write a bash script and a bash subroutine,
something along these lines:

    * the backups should be in formats such that I can access them by a variety
of other tools (as appropriate) if I need to -- if I backup an entire
directory or partition, I should be able to easily access and restore any
particular file from within that backup, and do so even if encrypted (i.e.,
encryption would be done by "standard programs" (a bad example might be
ccrypt) that I could use "outside" of the backup system.

    * the bash subroutine (command) that I write should basically do the
following:

       * check that the specified target exists (for things like removable
drives or NAS type things) and has (sufficient) space (not sure I can tell that
until after backup is attempted) (or an encrypted drive that is not mounted /
unencrypted, i.e., available to write to)

       * if the right conditions don't exist (above) tell me (I'm thinking of
an email as email is something that always gets my attention, maybe not
immediately, but soon enough)

       * if the right conditions do exist, invoke the commands to backup the
files

       * if the backup is unsuccessful for any reason, notify me (email again)

       * optionally notify me that the backup was successful (at least to the
extent of writing something)

       * optionally actually do something to confirm that the backup is readable
/ usable (need to think about what that could be -- maybe write it (to /tmp or
to a ramdrive), do something like a checksum (e.g., sha-256 or whatever makes
sense) on it and the original file, and confirm they match

       * ???

All of the commands invoked by the script should be parameters so that the
commands can be easily changed in the future (e.g., cp / tar / rsync, sha-256
or whatever, ccrypt or whatever, etc.)

Then the master script (actually probably scripts, e.g. one or more each for
hourly, daily, weekly, ... backups) would be invoked by cron (or maybe include
the at command? --my computers run 24/7 unless they crash, but for others, at
or something similar might be a better choice) would invoke that subroutine /
command for each file, directory, or partition to be backed up, specifying the
commands to use, what files to backup, where to back them up, encrypted or not,
compressed or not, tarred or not, etc.

In other words, instead of a configuration file, the system would just use bash
scripts with the appropriate commands, and invoked at the appropriate time by
cron (or with all backup commands in one script with backup times specified
with at or similar).

Aside: even if Amanda (for example) will always exist, I don't really want to
learn anything about it or any other program that might cease to be
maintainied in the future.

I wrote and use a homebrew backup and archive solution that started with a Perl script to invoke rsync (backup) and tar/ gzip (archive) over ssh from a central server according to configurable job files. My thinking was:

1. Use lowest-common denominator tools for backups and archives -- e.g. tar, gzip, rsync:

a. This allows use with the widest range of platforms -- my clients include GNU/Linux, FreeBSD, Windows/Cygwin, and macOS X.

b. Backup and archive contents are self-describing (live files and tar/ gzip archives).

c. The tools, and the backup and archive files, should be supported indefinitely.

2. Use ssh on the server to pull content from the clients, and lock down sshd on the server, so that if a client is compromised, the backups and archives are not readily accessible.


The script works, and automates what I was doing manually. It gives me ease of use and consistency.


But there are drawbacks:

1. Automating tar, gzip, and rsync was the tip of the iceberg. Additional automation was needed -- run all the daily backups, run all the daily archives, run all the weekly archives, move all archives for a given month into a month-year-stamped directory, tar and ccencrypt that directory, burn the encrypted tarballs to optical media, replicate the backup/ archive disk to other disk(s) for near-site and off-site rotation, etc.. My homebrew solution has grown into a suite of scripts and a repertoire of administration tasks.

2. The backups are only one and a fraction levels deep (one complete backup and a sparse tree of deleted files). This is the result of simplistic use of rsync's --delete and --backup-dir options. Use of time-stamped --backup-dir's would provide multiple levels of deletions, but recovery would be messier (likely requiring yet another script). I have yet to investigate rsync's --link-dest option (I believe rsnapshot [1] uses this).

3. No backup/ archive metadata database of who did what, from where, to where, when, how (command, arguments, output, errors), and why (cron, manual). But, the tar/ gzip/ rsync script captures stdout and stderr, and writes a job-date-time-stamped log file on each job run.

4. While all the code is in a version control system, I am loath to touch it. The code is old, my programming style has evolved considerably, there is no functional specification, no design specification, no test suite, no issue tracking, etc.. Everything was, and must be, designed, constructed, tested, and documented by hand.


I have been wanting to migrate to a better backup/ archive solution for years.


Recently, I migrated my SOHO file server to FreeBSD and ZFS. I then implemented zfs-auto-snapshot. This was a huge improvement. I should be able to do the same for /home on Debian (or perhaps there is a btrfs equivalent?). macOS -- perhaps. Windows -- I doubt it.


"Backup & Recovery" by Preston is a worthwhile read [2]. It gets you thinking about the many and inter-related issues.


Understand that there is no single solution for disaster preparedness/ disaster recovery. Each tool, strategy, etc., has its strengths and weaknesses. Each administrator must tailor an overall plan that balances risk, effort, and investment. I like having diversity -- multiple platforms, multiple technologies, multiple tools, multiple geographic locations, etc. -- and I like having redundancy -- multiple computers, multiple disks, multiple media, etc..


David

p.s. Keeping working files in a version control system can provide certain backup and sneaker net features.

References:

[1] https://www.tecmint.com/rsnapshot-a-file-system-backup-utility-for-linux/

[2] http://shop.oreilly.com/product/9780596102463.do


Reply to: