Re: deduplicating file systems: VDO with Debian?
On Wed, 09 Nov 2022 13:52:26 +0100 hw <hw@adminart.net> wrote:
Does that work? Does bees run as long as there's something to
deduplicate and
only stops when there isn't?
Bees is a service (daemon) which runs 24/7 watching btrfs transaction
state (the checkpoints). If there are new transactions then it kicks in.
But it's a niced service (man nice, man ionice). If your backup process
has higher priority than "idle" (which is typically the case) and
produces high load it will potentially block out bees until the backup
is finished (maybe!).
I thought you start it when the data is place and
not before that.
That's the case with fdupes, duperemove, etc.
You can easily make changes to two full copies --- "make changes"
meaning that
you only change what has been changed since last time you made the
backup.
Do you mean to modify (make changes) to one of the backups? I never
considered making changes to my backups. I do make changes to the live
data and next time (when the incremental backup process runs) these
changes do get into backup storage. Making changes to some backups ... I
won't call that backups anymore.
Or do you mean you have two copies and alternatively "update" these
copies to reflect the live state? I do not see a benefit in this. At
least if both reside on the same storage system. There's a waste in
storage space (doubled files). One copy with many incremental backups
would be better. And if you plan to deduplicate both copys, simply use a
backup solution with incremental backups.
Syncing two adjacent copies means to submit all changes a second time,
which was already transferred for the first copy. The second copy is
still on some older state the moment you update this one.
Yet again I do prefer a single process for having one[sic] consistent
backup storage with a working history.
Two copies on two different locations is some other story, that indeed
can have benefits.
> For me only the first backup is a full backup, every other backup is
> incremental.
When you make a second full backup, that second copy is not
incremental. It's a
full backup.
correct. That's the reason I do make incremental backups. And with
incremental backups I do mean that I can restore "full" backups for
several days: every day of the last week, one day for every month of the
year, even several days of past years and so on. But the whole backup of
all those "full" backups is not even two full backups in size. It's less
in size but offers more.
For me a single full backup needs several days (Terabytes via DSL upload
to the backup location) while incremental backups are MUCH faster
(typically a few minutes if there wasn't changed that much). So I use
the later one.
What difference does it make wether the deduplication is block based or
somehow
file based (whatever that means).
File based deduplication means files do get compared in a whole. Result:
Two big and nearly identical files need to get stored in full: they do
differ.
Say for example a backup of a virtual machine image which got started
between two backup runs. More than 99% of the image is the same as
before, but because there's some log written inside the VM image they do
differ. Those files are nearly identical, even in position of identical
data.
Block based deduplication can find parts of a file to be exclusive
(changed blocks) and other parts to set shared (blocks with same
content):
#####
# btrfs fi du file1 file2
Total Exclusive Set shared Filename
2.30GiB 23.00MiB 2.28GiB file1
2.30GiB 149.62MiB 2.16GiB file2
#####
here both files share data but do also have their exclusive data.
I'm flexible, but I distrust "backup solutions".
I would say, it depends on. I do also distrust everything, but some sane
solution maybe I do distrust a little less then my "self built" one. ;-)
Don't trust your own solution more than others "on principle", without
some real reasons for distrust.
Sounds good. Before I try it, I need to make a backup in case
something goes
wrong.
;-)
regards
hede
Reply to: