Re: deduplicating file systems: VDO with Debian?

To: debian-user@lists.debian.org
Subject: Re: deduplicating file systems: VDO with Debian?
From: hw <hw@adminart.net>
Date: Tue, 08 Nov 2022 05:31:53 +0100
Message-id: <[🔎] ae59abfba405f07d46342aa3e9ac0841ed1ccb9e.camel@adminart.net>
In-reply-to: <[🔎] 76f44e8de3b785bf4d3c6f2d33f487d4@der-he.de>
References: <[🔎] 45998a8cdc61d0945fc5907bc8b30b697b3f5703.camel@adminart.net> <[🔎] 76f44e8de3b785bf4d3c6f2d33f487d4@der-he.de>

On Mon, 2022-11-07 at 16:29 +0100, hede wrote:
> Am 07.11.2022 02:57, schrieb hw:
> > Hi,
> > 
> > Is there no VDO in Debian, and what would be good to use for 
> > deduplication with
> > Debian?  Why isn't VDO in the stardard kernel? Or is it?
> 
> I have used vdo in Debian some time ago and didn't remember big 
> problems. AFAIR I did compile it myself - no prebuild packages.

Cool, I could give that a try, ty.

> I switched to btrfs for other reasons. Not even for performance. The VDO 
> Layer eats performance, yes, but compared to naked ext4 even btrfs is 
> slow.

Really?  I never noticed that btrfs would be slow.  But then, it's been a long
time that I used ext4 ...

> > There is no point in 
> > deduplicating
> > backups after they're done because I don't need to save disk space for 
> > them when
> > I can fit them in the first place.
> 
> That's only one point.

What are the others?

>  And it's not really some valid one, I think, as 
> you do typically not run into space problems with one single action 
> (YMMV). Running multiple sessions and out-of-band deduplication between 
> them works for me.

That still requires you to have enough disk space for at least two full backups.
I can see it working for three backups because you can deduplicate the first
two, but not for two.  And why would I deduplicate when I have sufficient disk
space.

> In-band deduplication (that's the one you want) has some drawbacks, too: 
> High Ressource usage. You need plenty of RAM (up to several Gigabytes 
> per Terabyte Storage) and write success is delayed (-> slow direct i/o).

Well, if it takes 5 days or so to make a backup, that won't be very useful.  It
takes more than long enough already because my discs can only sustain so much.

> For Out-of-Band deduplication there are multiple different 
> implementations. File based dedup on directory basis can be very fast 
> and resource economical, for example via rdfind or jdupes. Block based 
> like via bees for btrfs (that's the one I use) is more close to in-band 
> deduplication (including high RAM usage). Bees can be switched off and 
> on at any time (for example if it's a small home-system which runs more 
> demanding tasks from time to time) and switching it on again resumes at 
> the last state (it starts at the last transaction id which was processed 
> -> btrfs knows its transactions).

Hm.  I wouldn't mind running it from time to time, though I don't know that I
would have a lot of duplicate data other than backups.  How much space might I
expect to gain from using bees, and how much memory does it require to run?

Reply to:

Follow-Ups:
- Re: deduplicating file systems: VDO with Debian?
  - From: DdB <debianlist@potentially-spam.de-bruyn.de>
- Re: deduplicating file systems: VDO with Debian?
  - From: hede <debian452@der-he.de>

References:
- deduplicating file systems: VDO with Debian?
  - From: hw <hw@adminart.net>
- Re: deduplicating file systems: VDO with Debian?
  - From: hede <debian452@der-he.de>

Prev by Date: Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)
Next by Date: Re: deduplicating file systems: VDO with Debian?
Previous by thread: Re: deduplicating file systems: VDO with Debian?
Next by thread: Re: deduplicating file systems: VDO with Debian?
Index(es):
- Date
- Thread