[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Problems with making hardlink-based backups



On Fri, Aug 14, 2009 at 08:43:32AM +0200, David wrote:
> Thanks for your suggestion, and I have heard of rsnapshot.
> 
> Although, actually removing older snapshot directories isn't really the problem.
> 
> The problem is, if you have a large number of such backups (perhaps
> one per server), then finding out where harddrive space is actually
> being used, is problematic (when your backup server starts running low
> on disk space).

keep each server's backup in a distinctly separate location. That
should make it clear which machines are burning up space.

> 
> du worked pretty well with rdiff-backup, but is very problematic with
> a large number of hardlink-based snapshots, which each have a complete
> "copy" of a massive filesystem (rather than just info on which files
> changed).

but they're not copies, they're hardlinks. I guess I don't understand
the problem. In a scheme like that used by rsnapshot, a file is only
*copied* once. If it remains unchanged then the subsequent backup
directories only carry a hardlink to the file. When older backups are
deleted, the hardlinks keep the file around, but no extra room is
used. There are only *pointers* to the file lying around. Then when
the file changes, a new copy will be made and subsequent backups will
hardlink to the new file. Now you'll be using the space of two files
with different sets of hardlinks pointing to them. (I'm sure you know
this, just making sure we are on common ground).

> 
> I guess I could do something like removing the oldest snapshot
> directories from *all* the backups, until there is enough free space.
> But that's kind of wasteful. Like, if I have one server that didn't
> change much over 2 years, then I can only keep eg the last 2-3 weeks
> of backups, because there is another server that has a huge amount of
> file changes in the same period. And not being able to use "du" is
> kind of annoying (actually, "locate" is also having major problems, so
> I disabled it on the backup server).

If you are using hardlinks, and nice discrete directories for each
machine, then a machine that has infrequent changes will not use a lot
of space because the files don't change. Other than the minimal space
used by the hardlinks themselves, you could save a *lot* of "backups"
of an unchanged file and use the same space as the one file because
there is only one actual copy of the file.

That said, the more often you backup rapidly changing data, the bigger
the backup gets because you store complete copies for each change. You
have to balance the needs of each machine (and probably have a
different scheme for each machine). How important is it to have access
to a specific change in a file? And for how long do you need access to
that specific change? These sorts of questions should help with these
decisions.


> 
> That's why I started working on a set of pruning/unpruning scripts,
> which basically "move" redundant info (the vast majority) over into
> compressed files (with ability to move out again later). Kind of like
> moving the snapshot-based approach closer to how rdiff-backup works
> (but, not chewing up huge amounts of ram and being hard to diagnose).
> That way admins can in theory more easily check where space is being
> used (but at the cost of not having quick access to earlier complete
> server snapshots).

you should be able to look at the difference between disk usage over
different time periods and figure out your "burn rate".  And using a
hardlink approach, you can easily archive older backups and then
remove them without laborious pruning. This is because if you delete a
file that haas multiple hard links to it, the file will still exist
until *all* the hardlinks are gone. So to remove a snapshot from
lastweek that contains files that haven't changed, you just remove
it. The files that you still need will still be there because you're
hardlinked to them. 

> 
> But I assume there must be better existing ways of handling this kind
> of problem, since backups aren't exactly something new.

I suspect I'm telling you stuff you already know and I apologize if I
appear condescending. The odds are you probably know more about
backups than I do. hth.

A

> 
> On Thu, Aug 13, 2009 at 5:48 PM, Andrew
> Sackville-West<andrew@farwestbilliards.com> wrote:
> > On Thu, Aug 13, 2009 at 09:20:17AM +0200, David wrote:
> >> Hi list.
> > [...]
> >> 3) Existing tools for managing hardlink-based snapshot directories
> >> etc.
> >
> > maybe rsnapshot is what you're after. It does hardlinked snapshots
> > with automagical deletion of older backups and configurable frequency
> > etc. I quite like it, though I'm not using it for high-volume stuff.
> >
> > Once little caveat that always seems to get me: the daily won't run
> > until you've completed enough hourlies, the weekly won't run until
> > you've completed a week's worth of dailies, etc. Very disconcerting
> > the first few days of use.
> >
> > A
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.9 (GNU/Linux)
> >
> > iEYEARECAAYFAkqENcwACgkQaIeIEqwil4bCHQCeLWJ+9UcjtYqyolT6kiK7kDLy
> > R20Aniawf/KsnU2uEG7D+35DjoksUJgS
> > =qhWD
> > -----END PGP SIGNATURE-----
> >
> >
> 
> 
> -- 
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org 
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
> 
> 

-- 

Attachment: signature.asc
Description: Digital signature


Reply to: