[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Problems with making hardlink-based backups



On Sat, Aug 15, 2009 at 4:35 AM, Rob Owens<rowens@ptd.net> wrote:
> You might want to check out BackupPC.  It uses hardlinks and
> compression.  It's got a web-based GUI which makes it pretty easy to
> find statistics on disk space used, on a per-server basis.
>

I've been researching backuppc, and it seems like it wants to store
everything in a pool, including the latest backup. Is there a way to
keep the latest backup outside the pool area?

Reason being, that while the pool is a very space-efficient, the
layout is somewhat opaque, and afaict it's not very straightforward to
get to the actual backed up files (by scripts, admin users, etc,
logged into the backup server).

Places where I'm forseeing problems:

1) Offsite-backups.

My current scripts use rsync to update the latest snapshots (for each
user, server, etc), over to a set of external drives. With backuppc,
I'll probably have to find the correct backuppc script incantation (or
hack together something), to restore the latest backup to a temporary
location on the backup server, before copying over to the external
drive.

Problems:

a. Complicated

b. Going to be slow (slower than if there was an existing directory)

c. Going to use up a lot of extra harddrive space on the backup
server, to store the restored snapshot (for eg: backed up file
servers). Unless I work out something ugly whereby uncompressed
backuppc hardlinks are linked to a new structure.. (this is incredibly
ugly).

d. Inefficient - if only a few files have changed on a huge backed-up
filesystem, you still need to restore the entire snapshot out of the
backuppc pool.

2) Admin-friendly.

It's simpler for admins to find browse through files in a directory
structure on the backup server, on a command-line (or with winscp or
samba or whatever), rather than having to go through a web frontend.
99% of the time they're looking for stuff from the latest snapshot, so
it's acceptible for them (or myself) to have to run special commands
to get to the older versions. But the latest snapshot I do actually
want to be present on the harddrive (rather than hidden away in a
pool).

3) Utility-friendly.

With a directory structure, I can run du and determine which files are
huge, or use other unixy things. Without it, I and scripts, admins,
etc, have to go through the backuppc-approved channels ... unnecessary
complication imo.

---

I guess one way to do this, is to use the regular rsync-based backup
methods, to make/update the latest snapshot, and then backup that with
backuppc. But that has the following disadvantages:

1) Lots more disk usage.

 Backuppc would be making an indepdendant copy of all the data. It
won't be eg, making hardlinks against the latest snapshot, or reverse
incrementals, or something like that.

2) Redundnant and complicated.

Backuppc is meant to be a "one stop", automated thing. If I'm already
handling scheduling and the actual transports, etc from my scripts,
then it's redundant. All that it's being used for is it's pooled
approach, which still has the above problems.

---

Basically.. what I would require from backuppc, is a way to tell it to
preserve a local copy of the latest snapshots (in easy-to-find
locations on the backup server, so admins or scripts can use them
directly), and to only move older versions to the pool... while at the
same time taking advantage of the latest snapshot to conserve backup
server harddrive space (reverse incremental, hardlink to it, etc).

Does anyone who is familiar with backuppc know if the above is possible?

(Although I kind of doubt it at this point. My use cases seem to break
the backuppc design ^^; )

I should probably post about this to the backuppc mailing lists too..
their users would have a lot more relevant experience. In the
meanwhile, I'll probably continue to use a pruned hardlinks approach.

David.


Reply to: