Re: deduplicating file systems: VDO with Debian?

To: Debian User <debian-user@lists.debian.org>
Subject: Re: deduplicating file systems: VDO with Debian?
From: hede <debian452@der-he.de>
Date: Tue, 08 Nov 2022 15:07:22 +0100
Message-id: <[🔎] 5debce39cbafd85a8f59b6d2d131c1b3@der-he.de>
In-reply-to: <[🔎] ae59abfba405f07d46342aa3e9ac0841ed1ccb9e.camel@adminart.net>
References: <[🔎] 45998a8cdc61d0945fc5907bc8b30b697b3f5703.camel@adminart.net> <[🔎] 76f44e8de3b785bf4d3c6f2d33f487d4@der-he.de> <[🔎] ae59abfba405f07d46342aa3e9ac0841ed1ccb9e.camel@adminart.net>

On 08.11.2022 05:31, hw wrote:

That still requires you to have enough disk space for at least two fullbackups.

Correct, if you do always full backups then the second run will consumefull backup space in the first place. (not fully correct with beesrunning -> *)

That would be the first thing I'd address. Even the simplest backupsolutions (i.e. based on rsync) do make use of destination rotation andonly submitting changes to the backup (-> incremental or differentialbackups). I never considered successive full backups as a backup"solution".

For me only the first backup is a full backup, every other backup isincremental.

Regarding dedublication, I do see benefits in dedublication either ifthe user moves files from one directory to some other directory, inpartly changed files (my backup solution dedubes on file basis viahardlinks only), and with system backups of several different machines.

I prefer file based backups. So my backup solutions dedublication skillsare really limited. But a good block based backup solution can handleall these cases by itself. Then no filesystem based dedublication isneeded.

If your problem is only backup related and you are flexible regardingyour backup solution, then probably choosing a backup solution with agood dedublication feature should be your best choice. The solutiondon't has to be complex. Even simple backup solutions like borg backupare fine here (borg: chunk based deduplication even of parts of filesacross several backups of several different machines). Even yourcriteria to not write duplicate data in the first place is fulfilledhere.

(see borgbackup in Debian repository; disclaimer: I do not have personalexperience with borg as I'm using other solutions)

I wouldn't mind running it from time to time, though I don't know thatIwould have a lot of duplicate data other than backups. How much spacemight Iexpect to gain from using bees, and how much memory does it require torun?

Bees should run as a service 24/7 and catches all written data rightafter it gets written. That's comparable to in-band dedublication evenif it's out-of-band by definition. (*) This way writing many duplicatefiles will potentially result in removing duplicates even if not alldata has already written to disk.

Therefore also memory consumption is like with in-band deduplication(ZFS...), which means you should reserve more than 1 GB RAM per 1 TBdata. But it's flexible. Even less memory is usable. But then it cannotfind all duplicates as the hash table of all the data doesn't fit intomemory. (Nevertheless even then dedublication is more efficient thanexpected: if it finds some duplicate block it looks for any blocksaround this block. So for big files only one match in the hash table issufficient to dedublicate the whole file.)


regards
hede

Reply to:

Follow-Ups:
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>

References:
- deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: hede <debian452@der-he.de>
- Re: deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>

Prev by Date: Re: support for ancient peripherals
Next by Date: Re: deduplicating file systems: VDO with Debian?
Previous by thread: Re: deduplicating file systems: VDO with Debian?
Next by thread: Re: deduplicating file systems: VDO with Debian?
Index(es):
- Date
- Thread