[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Debconf-discuss] btrfs deduplication / bedup users at DebConf13?

Le 15 août 2013 01:08, "Adam Borowski" <kilobyte@angband.pl> a écrit :

First, some context : I'm trying to efficiently store a huge nfs-root farm (objective: 500+), therefore the perf impact should be very limited.

I don't bother dedup inside files. I'm only aiming at deduping whole files.

> There are two ways:

I'm exploring a third way, entirely userspace (I'll draft a mixed one at the end).

For that I'm using the "hardlink" package.

It does operate on the file level, in userspace. It has some serious drawbacks :
* bugged on a race condition when changing files
* once hardlinked, one cannot write in the file anymore. It has to be written as a new one then replaced using rename(2).

Despite these it has huge benefits :
* Nil performance impact.
* Asynchronous (offline) deduplication can be scheduled off peak hours.
* usable in old-stable kernels

> * a nice and clean way.  The kernel interface would need to be "hey kernel,
>   I think the block in fd 1 offset 0 might be same as a block in fd 2 offset
>   4096, care to compare and perhaps combine them?".

So all the cleverness of *what* to merge would only happen in userspace ? What would be the impact of a runtime read ?

> Offline (a confusing name, it's a mounted filesystem but at a later time)

It can even be done asynchronously, by registering an inotify on it, and then queuing the dedups in a userspace daemon.

> See how much fun can we have with data structures?

> And the best of all, the kernel needs just a single syscall, with all the complexity done in userspace.

That's the whole beauty of it. You then create a whole ecosystem of softwares to address that complexity in every different manner possible :)

Now, the mixed approach I promised earlier :

As pure userspace take is not ideal, I was thinking about adding a FUSE in-place layer than would synchronously copy deduped-hardlinks on write. Could be triggered by a open(w) or a real write().

Else than that, just offer a raw, native, access to every other fileops.


Reply to: