[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Problems with making hardlink-based backups



On Mon, Aug 17, 2009 at 6:26 PM, Andrew
Sackville-West<andrew@farwestbilliards.com> wrote:
> Here's another question: what is stored in all these millions of
> files?
> [...]

Basically, the lions share would be tonnes of user-generated files,
for example huge numbers of image files (and thumbnails) that get
stored in directory structures on one of the file servers. Other
examples would be extensive music & sound libraries, several
debian/ubuntu/etc mirrors, and so on.

About tarring before backing up. Yeah, that's possible too (for some
types of data/directory layouts). But then something needs to (on the
file server side), check if the tars are still up to date. And also
those tars will take up a lot of precious harddrive space on the file
server :-(. Unless you mean remove the original data.. which is
problematic in a few ways. And of course, storing different versions
of those tars (eg: users move files around at the source) is also
problematic.

Basically... as you say it would be like tail wagging the dog. Things
would get a lot more complicated & fragile, and in exchange I get a
lot of other, more serious backup problems, which are harder to work
around than the current issues.

About moving to database. Well the filesystem is already a database
:-). And then trying to keep backups of that (multi-TB) database
itself is a major problem. Not to mention, users and software now have
to go through some other software to get to their files... don't want
to go there.. my head hurts ^^;

The file servers themselves do have a large number of files... that
isn't really the problem. The problem is actually in the backup
software which causes issues trying to handle history for those
backups (either using massive amounts of memory/cpu, or creating
massive numbers of hardlinks, and so on).

Basically, rdiff-backup was perfect for a while. But then we upgraded
the server to Lenny. And then it stopped working T_T. I think that
rdiff-backup's author must have changed something, which now causes
huge ram usage for large file lists, or other per-file data of some
kind. imo that's unnecessary (it could just use something like a set
of Python iterators in a clever way, or work with incremental file
lists like rsync), but I didn't get any useful replies on their
mailing list when I mentioned my problem and gave a few ideas.

So for now, a combination of ugly hacks with hardlink-type pruning for
history snapshots, and blindly deleting older backup generations to
get space back when needed. At least until I find a better solution.

Anyway, thanks for your ideas :-)

David.


Reply to: