[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: deduplicating file systems: VDO with Debian?



On Wed, 09 Nov 2022 13:52:26 +0100 hw <hw@adminart.net> wrote:

Does that work? Does bees run as long as there's something to deduplicate and
only stops when there isn't?

Bees is a service (daemon) which runs 24/7 watching btrfs transaction state (the checkpoints). If there are new transactions then it kicks in. But it's a niced service (man nice, man ionice). If your backup process has higher priority than "idle" (which is typically the case) and produces high load it will potentially block out bees until the backup is finished (maybe!).

I thought you start it when the data is place and
not before that.

That's the case with fdupes, duperemove, etc.

You can easily make changes to two full copies --- "make changes" meaning that you only change what has been changed since last time you made the backup.

Do you mean to modify (make changes) to one of the backups? I never considered making changes to my backups. I do make changes to the live data and next time (when the incremental backup process runs) these changes do get into backup storage. Making changes to some backups ... I won't call that backups anymore.

Or do you mean you have two copies and alternatively "update" these copies to reflect the live state? I do not see a benefit in this. At least if both reside on the same storage system. There's a waste in storage space (doubled files). One copy with many incremental backups would be better. And if you plan to deduplicate both copys, simply use a backup solution with incremental backups.

Syncing two adjacent copies means to submit all changes a second time, which was already transferred for the first copy. The second copy is still on some older state the moment you update this one.

Yet again I do prefer a single process for having one[sic] consistent backup storage with a working history.

Two copies on two different locations is some other story, that indeed can have benefits.

> For me only the first backup is a full backup, every other backup is
> incremental.

When you make a second full backup, that second copy is not incremental. It's a
full backup.

correct. That's the reason I do make incremental backups. And with incremental backups I do mean that I can restore "full" backups for several days: every day of the last week, one day for every month of the year, even several days of past years and so on. But the whole backup of all those "full" backups is not even two full backups in size. It's less in size but offers more.

For me a single full backup needs several days (Terabytes via DSL upload to the backup location) while incremental backups are MUCH faster (typically a few minutes if there wasn't changed that much). So I use the later one.

What difference does it make wether the deduplication is block based or somehow
file based (whatever that means).

File based deduplication means files do get compared in a whole. Result: Two big and nearly identical files need to get stored in full: they do differ. Say for example a backup of a virtual machine image which got started between two backup runs. More than 99% of the image is the same as before, but because there's some log written inside the VM image they do differ. Those files are nearly identical, even in position of identical data.

Block based deduplication can find parts of a file to be exclusive (changed blocks) and other parts to set shared (blocks with same content):

#####
# btrfs fi du file1 file2

     Total   Exclusive  Set shared  Filename
   2.30GiB    23.00MiB     2.28GiB  file1
   2.30GiB   149.62MiB     2.16GiB  file2
#####
here both files share data but do also have their exclusive data.

I'm flexible, but I distrust "backup solutions".

I would say, it depends on. I do also distrust everything, but some sane solution maybe I do distrust a little less then my "self built" one. ;-)

Don't trust your own solution more than others "on principle", without some real reasons for distrust.

Sounds good. Before I try it, I need to make a backup in case something goes
wrong.

;-)

regards
hede


Reply to: