[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Backing up the system



On Sat, Oct 07, 2000 at 11:30:18AM +0200, Willi Dyck (wdyck@gmx.de) wrote:
> Hi all,
> 
> i wanna backup my whole system. which method do you recommand?
> should i zip / or should i write an image of /dev/hdax?
> i also want to keep all the file/user/group permissions including hidden
> files.
> what else should i think over? thanx for any help.

------------------------------------------------------------------------
Linux Backups mini-FAQ
------------------------------------------------------------------------
Karsten M. Self <kmself@ix.netcom.com>
Written:  Saturday October  7, 2000
Modified:  Saturday October  7, 2000
========================================================================


Following are my general recommendations for system backups, with a
strong focus toward an individual workstation or small network
with relatively informal practices.  I run Debian GNU/Linux, there will
be some variants of file locations for other Linux distributions or
proprietary Unices.  Larger networks or server farms should probably
look into a more structured system.

This document is divided into discussions of:

  o Hardware
  o Software
  o What to back up
  o How to back it up
  o When to back it up
  o Further reading
  o Sample backup script


------------------------------------
Hardware
--------

There are several alternatives for backup hardware.  In roughly
descending order of my preference, they are:

  o SCSI tape
  o CDR-Writable or CDR-Re-writable media
  o DLT and other high-performance tape solutions
  o Magneto-Optical
  o QiC / Travan tape
  o Networked storage
  o Removable storage (120 MB Superfloppy, Zip, Jaz, Floppy)
  o Ancillary storage (Additional onboard HD storage)



SCSI
----

I purchased an HP SureStore 2000 DAT drive in October of 1997.  It's
been used for semi-regular backups of my home system for the past three
years.  No problems; I can recommend HP Surestore SCSI storage strongly.

In comparing costs of SCSI to other tape drive types, you'll want to
take into account both drive and media costs.  At a count of about ten
media units (tapes), the SCSI became the cheaper option -- ~$9 per 4 GB
tape rather than ~$35 for Travan/QIC cartridges.  SCSI DAT is also a
time-tested and highly dependable technology.  Pricewatch lists current
costs as HP media at $3/unit, Travan 8 GB media run $20-24.  DAT is
solid, dependable, proven technology, and the media are cheap and
reusable.  Just what you're looking for in a backup.  

What's nice about SCSI tape is that in three years of admittedly light
use -- say 1-8 times/month, I've never had a write error.  While B/U's
take some time, they only need to be run once.  My current backup script
hits key system files, announces (via wall) its progress, then rewinds
and verifies the tape, and finally rewinds and ejects it.  Minimal fuss.
I've been backing up more frequently in the past six months or so --
every couple of days if I can help it.

The downside is that tape capacity, relative to today's drive sizes, is
limited.  The system costing me about $400 new, provides ~4GB compressed
storage, which works for me, but you'll have to look at higher capacity
tape drives for your 9-40GB disks out now.  A comparably priced drive
today would have about 10-20 GB capacity, which should suffice.


       Comparative pricing of 4mm SCSI DAT units
       -----------------------------------------
		    Capacity
      Vendor        raw/cmpr       Cost     Media
      -------------------------------------------
      HP            2/4            $120        $3
      HP            4/8            $185        $6
      HP            4/8            $185        $6
      HP           12/24           $603       $11 *
      HP           20/40           $810       $27 **
      -------------------------------------------
	Notes:
	  * Other vendor pricing starts at ~$320.
	  ** Other vendor pricing lower.
      -------------------------------------------


I'm citing Hewlett Packard largely as they have a good name in quality,
both anecdotally and in direct personal experience for work and private
use.  Sony's 12/24 and 20/40 drives are about half the cost of the
equivalent HP drive.  At the upper end of the range, tape changers start
appearing, with compound capacities into the hundreds of GB.



CDR / CDRW
----------

Of the remaining alternatives, CDR/CDRW and QiC/Travan are probably the
most popular in current systems, and might be considered a necessity.
CD-RW drives start at about $130, with good branded drives running
$150-$160.  High-end is a Creative Labs 2x4x32x CD-RW for $189.  For
media, a CD-R 200 pack runs $147 - $261, or about $1.13/GB.

While I don't have specific experience with CDs, my understanding is
that they're sensitive to buffering, and it's often helpful to create an
online image file, then cut to media, requiring additional online
storage.  Media size, at 650 MB, is significantly limited relative to
tape -- it would take 31 CDs to match the capacity of one 20 GB tape.
There are also reliability issues: CDRWs *must* be tested before they're
considered good, and may not function in all drives.

My recommendation would be to use CDR/CDRW if you have it and are
satisfied, but to explore a SCSI solution if your needs aren't met.


Other Media
-----------

DLT and other high-performance tape solutions are more likely to be
found in professional or commercial settings.  While generally reliable,
flexible, and fast, they're beyond the pale for the typical home user. 

Magneto-Optical has had a rough life, though it's a fundamentally solid
technology.  For removable random-access rewriteable storage, it's
strongly recommended, though it is both expensive, slower than pure
magnetic media, and typically offers less immediately accessible storage
-- a 1 GB MO disk has two 512 MB sides.

QiC / Travan tape, as indicated above, is not cost-effective when media
counts exceed 10-15 units.  SCSI is recommended instead.  If you already
have such a drive, you don't need to replace it, though you may want to
evaluate your storage needs and decide that it may be more effective to
so so.

Networked storage is the practice of saving local files on other systems
in your local (or remote) network.  This can be an effective solution,
though you may lack the flexibility and redundancy possible by
inexpensive removable media backups.

Removable storage media, typically magnetic, random access (as opposed to
serial access, as with tape) media, are generally *strongly* discouraged
for backup of all but the smallest or most sensitive data, on the basis
of cost, reliability, and convenience.  Traditional 3.5" 1.4 MB floppy 
disks are actually one of the most expensive storage formats available,
in $/MB.  L-120 superfloppy and Zip disks are reasonably good ways to
store mid-sized archives of ~100 MB, though reliability may be an issue.
For full-system backups, they are simply not an option.  I simply cannot
recommend Jaz disks.  The only question in their use has been when, not
if, both disks and drives fail.  When they do so, you lose data in
expensive multiples of 2 GB.  I've been through three drives and ten
disks in about 18 months of use before I threw in the towel.  Winchester
storage and removable media are mutually exclusive concepts.

Ancillary storage.  Additional onboard HD storage is actually one of the
cheapest ways of storing data -- at current IDE disk prices, storage is
about $4 per *Giga*byte.  The solution is fast, flexible, and
convenient.  What it is not is reliable -- you are limited to a single
storage unit, and if storage loss is related to an event directly
affecting the system, including loss, theft, physical, or electronic
damage, you have no backups.  I'd recommend instead looking at a RAID or
mirroring solution to provide additional redundancy instead, *combined*
with an offline storage alternative.



------------------------------------
Software
--------

On this.  Unless you have specific requirements to meet (eg:  management
can't keep from mucking with a technical decision), I'd choose the
simplest backup methods possible.  My own local backup script is:

    #!/bin/bash

    # Create backups of /etc, /home, /usr/local, and...

    mt rewind
    tar cvf /dev/nst0 /etc
    tar cvf /dev/nst0 /home
    tar cvf /dev/nst0 /usr/local

    # and selected /var directories
    tar cvf /dev/nst0 /var/backups
    tar cvf /dev/nst0 /var/cache/apt
    tar cvf /dev/nst0 /var/lib
    tar cvf /dev/nst0 /var/log
    tar cvf /dev/nst0 /var/www
    mt rewoffl

Tar isn't the sexiest thing out there (honey is <g>), but damned if it
doesn't work, and if the tools for accessing archives aren't available
on every flavor of Unix, and most lesser operating systems, not to
mention boot, rescue, and minimal installations of Linux.  You *will* be
able to get at your data.

Other general recommendations -- dump, cpio, and apio.  I'd generally
*avoid* using an integrated backup management solution -- far less
portable, and you may *not* be able to get at your data, unless you are
part of a large and well-supported organization.  You get some pluses
-- usually a searchable index or other log of what was archived, but it
costs you in terms of flexibility.

The advantages of various alternatives:

  o dump:  creates a directory of archives and an access interface.
    It's possible to navigate through this when restoring from backup.
    Downside:  some filesystem formats aren't supported, you may not be
    able to access your archives from another system.

  o cpio:  greater internal consistency and integrity controls than tar,
    backward compatibility with tar and other formats.  May handle types
    of files which aren't supported by tar.
    Downside:  I have to read the man page *and* _Linux in a Nutshell_
    every time I want to use it.  If you thought tar was nonintuitive,
    try cpio.

  o afio:  yet another advanced archive manipulation utility.  Similar
    in regards to cpio.
    Downsides:  I know even less of afio than cpio.  Nonstandard is not
    good WRT backups.



------------------------------------
What to Back Up
---------------

The general rules are this:

  o You want to back up that which you can't readily restore from other
    sources.  

  o You don't want to back up that which you can readily restore
    from other sources.

  o You don't want to back up that which you aren't interested in
    preserving.


My own backup script (/usr/local/sbin/system-backups) which I run weekly
(or weakly).  Note the /var/cache/apt line, which is specific to Debian
-- you may want to include the RedHat equivalent, essentially the RPM
database.

Generally speaking, you're not interested in:

  o /tmp
  o /usr (except for /usr/local)
  o bits and pieces of /var

You absolutely want:

  o /home
  o /etc
  o /usr/local

You probably want:

  o Bits and pieces of /var
  o (probably) /root (which I should add, thinking of it now).
  o (possibly) /boot, 
  o Other local filesystems outside the FSB.

...the philosophy being that you can reconstruct your distribution from
package information (and would probably benefit from an upgrade anyway).
You *can't* recover localized data and  system configurations,  from a
generic image, CD, or net archive. 

Protect what's valuable to you.

It might also make sense to create archives of your disk partitions
(fdisk -l /dev/<your device here>) and related hardware information.

I like tar because of its universal access -- I can retrieve these
archives from any system, anywhere.  Not just Linux, not just Unix.
Other backup/recover tools offer greater functionality, but generally
reduce the flexibility of access.



------------------------------------
When to back it up
------------------

Early and often.

There are complex "Tower of Hanoi" backup schedules designed to provide
maximum backup coverage while minimizing use of tape and time involved
in backups.  You can find these documented in a good system
administration text (see below for recommendations).  For a typical
single-user system, periodic full archives on a set of rotated tapes
should be reasonably sufficient.  My own schedule is to perform full
backups once or twice a week.


------------------------------------
Further Reading
--------------------

    Evi Nemeth, Garth Snyder, Trent R. Hein, _UNIX System
    Administration Handbook, Third Edition_, Prentice Hall, (c) 2000,
    ISBN 0-13-020601-6

    AEleen Frisch, _Essential System Administration_, O'Reilly &
    Associates, (c) 1995, ISBN 1-56592-127-5
    http://www.ora.com/catalog/esa2/

    M Carling, Stephen Degler, James Dennis, _Linux System
    Administration_, New Riders Press, (c) 2000, ISBN 0-56205-934-3

    Curtis W. Preston, _Unix Backup and Recovery_, O'Reilly &
    Associates, (c) 1999, ISBN 1-56592-642-0
    http://www.ora.com/catalog/unixbr/


------------------------------------
Sample backup script
--------------------

My system backup script follows.  It backs up a series of directories
using tar, verifies the archives, and shouts frequently to all open
terminals what's going on.

I'm not saying it's the pinnacle of backup scripts, but it works for me.
Typically run to 'batch' a couple times a week, runs unattended for
several hours.

--------------------< begin system-backup >--------------------
#!/bin/bash

# Create backups of /etc, /home, /usr/local, and...
PATH=/bin:/usr/bin

backupdirs="/etc /root /boot /home /usr/local /var/backups /var/cache/apt \
/var/lib /var/log /var/www"

mt rewind
for path in $backupdirs
do
    echo "System backup on $path" | wall
    tar cf /dev/nst0 $path
    sleep 2
done

echo "System backups complete, status: $?" | wall
echo "Now verifying system backups" | wall

mt rewind

for path in $backupdirs
do
    echo "Verifying $path...." | wall
    tar tf /dev/nst0 && \
        echo "$path: verified" | wall || \
	echo "$path: errors in verify" | wall
    if [ $? -eq 0 ]
        then echo "$path: verified"
	else echo "$path: error(s) in verify" 1>&2
    fi
    mt fsf 1
done

mt rewoffl

echo "Please remove backup tape" | wall
--------------------< end system-backup >--------------------

------------------------------------------------------------------------

-- 
Karsten M. Self <kmself@ix.netcom.com>     http://www.netcom.com/~kmself
 Evangelist, Opensales, Inc.                    http://www.opensales.org
  What part of "Gestalt" don't you understand?      There is no K5 cabal
   http://gestalt-system.sourceforge.net/        http://www.kuro5hin.org
GPG fingerprint: F932 8B25 5FDD 2528 D595 DC61 3847 889F 55F2 B9B0

Attachment: pgpBwZm0OE9rs.pgp
Description: PGP signature


Reply to: