Re: deduplicating file systems: VDO with Debian?

To: debian-user@lists.debian.org
Subject: Re: deduplicating file systems: VDO with Debian?
From: hede <debian452@der-he.de>
Date: Thu, 10 Nov 2022 11:40:17 +0100
Message-id: <[🔎] 2e4621cbd76e1f7db8ca0754ddd244de@der-he.de>
References: <[🔎] 45998a8cdc61d0945fc5907bc8b30b697b3f5703.camel@adminart.net> <[🔎] 76f44e8de3b785bf4d3c6f2d33f487d4@der-he.de> <[🔎] ae59abfba405f07d46342aa3e9ac0841ed1ccb9e.camel@adminart.net> <[🔎] 5debce39cbafd85a8f59b6d2d131c1b3@der-he.de> <[🔎] 1059c0b363074c3015b398d59cbd66e67c5d222f.camel@adminart.net>

On Wed, 09 Nov 2022 13:52:26 +0100 hw <hw@adminart.net> wrote:

Does that work? Does bees run as long as there's something todeduplicate and
only stops when there isn't?

Bees is a service (daemon) which runs 24/7 watching btrfs transactionstate (the checkpoints). If there are new transactions then it kicks in.But it's a niced service (man nice, man ionice). If your backup processhas higher priority than "idle" (which is typically the case) andproduces high load it will potentially block out bees until the backupis finished (maybe!).

I thought you start it when the data is place and
not before that.


That's the case with fdupes, duperemove, etc.

You can easily make changes to two full copies --- "make changes"meaning thatyou only change what has been changed since last time you made thebackup.

Do you mean to modify (make changes) to one of the backups? I neverconsidered making changes to my backups. I do make changes to the livedata and next time (when the incremental backup process runs) thesechanges do get into backup storage. Making changes to some backups ... Iwon't call that backups anymore.

Or do you mean you have two copies and alternatively "update" thesecopies to reflect the live state? I do not see a benefit in this. Atleast if both reside on the same storage system. There's a waste instorage space (doubled files). One copy with many incremental backupswould be better. And if you plan to deduplicate both copys, simply use abackup solution with incremental backups.

Syncing two adjacent copies means to submit all changes a second time,which was already transferred for the first copy. The second copy isstill on some older state the moment you update this one.

Yet again I do prefer a single process for having one[sic] consistentbackup storage with a working history.

Two copies on two different locations is some other story, that indeedcan have benefits.

> For me only the first backup is a full backup, every other backup is
> incremental.
When you make a second full backup, that second copy is notincremental. It's a
full backup.

correct. That's the reason I do make incremental backups. And withincremental backups I do mean that I can restore "full" backups forseveral days: every day of the last week, one day for every month of theyear, even several days of past years and so on. But the whole backup ofall those "full" backups is not even two full backups in size. It's lessin size but offers more.

For me a single full backup needs several days (Terabytes via DSL uploadto the backup location) while incremental backups are MUCH faster(typically a few minutes if there wasn't changed that much). So I usethe later one.

What difference does it make wether the deduplication is block based orsomehow
file based (whatever that means).

File based deduplication means files do get compared in a whole. Result:Two big and nearly identical files need to get stored in full: they dodiffer.Say for example a backup of a virtual machine image which got startedbetween two backup runs. More than 99% of the image is the same asbefore, but because there's some log written inside the VM image they dodiffer. Those files are nearly identical, even in position of identicaldata.

Block based deduplication can find parts of a file to be exclusive(changed blocks) and other parts to set shared (blocks with samecontent):


#####
# btrfs fi du file1 file2

     Total   Exclusive  Set shared  Filename
   2.30GiB    23.00MiB     2.28GiB  file1
   2.30GiB   149.62MiB     2.16GiB  file2
#####
here both files share data but do also have their exclusive data.

I'm flexible, but I distrust "backup solutions".

I would say, it depends on. I do also distrust everything, but some sanesolution maybe I do distrust a little less then my "self built" one. ;-)

Don't trust your own solution more than others "on principle", withoutsome real reasons for distrust.

Sounds good. Before I try it, I need to make a backup in casesomething goes
wrong.


;-)

regards
hede

Reply to:

References:
- deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: hede <debian452@der-he.de>
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: hede <debian452@der-he.de>
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>

Prev by Date: Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))
Next by Date: Re: Increased read IO wait times after Bullseye upgrade
Previous by thread: Re: deduplicating file systems: VDO with Debian?
Next by thread: How to communicate between QEMU host and two guests?
Index(es):
- Date
- Thread