[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Best way to duplicate HDs



On Tue, 1 Jan 2002 22:49, Jason Lim wrote:
> Right now one of the things we are testing is:
> 1) mount up the "backup" hard disk
> 2) cp -a /home/* /mnt/backup/home/
> 3) umount "backup" hard disk
>
> The way we do it right now is:
> 1) a backup server with a few 60Gb HDs
> 2) use "dump" to cp the partitions over to the backup server
> 3) use "export" to restore stuff
> (not very elegant... which is why we're trying to set up a better way)
>
> Unless a cracker spends quite a bit of time going through everything, they
> would most probably miss this part. True... if they do spend enough time
> going through everything, then as you said, it is potentially gone.

Yes.  However if you NFS export the file system to the backup machine then as 
long as the backup machine is safe then there shouldn't be any problems.

NFS over 100baseT full-duplex performs pretty well really, why not use it?

> > The most common problem in this regard I've encountered when running
> > ISPs
> > (see at many sites with all distributions of Linux, Solaris, and AIX) is
> > when
> > someone makes a change which results in a non-bootable system.  Then
> > several
> > months later the machine is rebooted and no-one can remember what they
> > changed.
>
> Haven't had that yet... because every time we make a massive system change
> that might upset the "rebootability" of the server (eg. fiddle with lilo,
> partition settings, etc.) we do a real reboot. This might not be pratical
> on a system that needs 99.9999% uptime, but ensures it will work in
> future.

Lots of things may seem like minor changes but have unforseen impacts in 
complex systems.  I've seen machines become unbootable because of changes to 
/etc/hosts, changes to NFS mounting options (worst case is machines mounting 
each other's file systems and having mounts occur before exports so that they 
can't boot at the same time), changes to init.d scripts (a script that hangs 
will stop the boot process), and daemons that hang on startup when there is 
no disk space (so lack of space triggers a crash and an automatic reboot and 
then the machine is dead).

Also when you have multiple machine dependencies you sometimes have to reboot 
all machines to test everything properly.

Unfortunately some of the companies I work for refuse to allow me to perform 
basic tests such as "reboot all machines at once", so if there is ever a 
power failure then they are likely to discover some problem...

> > > but the system must stay up and operational at all times.
> >
> > LVM.  Create a snapshot of the LV and then use dd to copy it.
>
> Eep... setting up LVM for the SOLE purpose of doing this mirroring? Seems
> a bit like overkill and would add an extra level of complexity :-/

True.  Much easier to use software RAID.

> > I think that probably your whole plan here is misguided.  Please tell us
> > exactly what you are trying to protect against and we can probably give
> > better advice.
>
> I know of a few hardware solutions that do something like this, but would
> like to do this in hardware. They claim to perform a "mirror" of one HD to
> another HD while the system is live and in use.

It's called RAID-1.

> I have no idea how it does
> this without corruption of some type (as you mentioned above, doing dd on
> a live HD will probably cause errors, especially if the live HD is in
> use). For example, http://www.arcoide.com/ . To quote the function we're
> looking at " the DupliDisk2 automatically switches to the remaining drive

So setup three disks in a software RAID-1 configuration with one disk being 
marked as a "spare" disk.  Then have a script run from a cron job every day 
which marks the first disk as failed, this will cause the spare disk to be 
added to the RAID set and have the data copied to it.  After setting one disk 
as failed the script can then add it back to the RAID as a spare disk.

This means that apart from the RAID copying time (at least 20 minutes on an 
idle array - longer on a busy array) you will always have two live active 
copies of your data.  Before your script runs you'll also have an old 
snapshot of your data which can be used to recover from operator error.

This will do everything that the arcoide product appears to do.

-- 
http://www.coker.com.au/bonnie++/     Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/       Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/     My home page



Reply to: