Le 15 août 2013 01:08, "Adam Borowski" <kilobyte@angband.pl> a écrit :
First, some context : I'm trying to efficiently store a huge nfs-root farm (objective: 500+), therefore the perf impact should be very limited.
I don't bother dedup inside files. I'm only aiming at deduping whole files.
> There are two ways:
I'm exploring a third way, entirely userspace (I'll draft a mixed one at the end).
For that I'm using the "hardlink" package.
It does operate on the file level, in userspace. It has some serious drawbacks :
* bugged on a race condition when changing files
* once hardlinked, one cannot write in the file anymore. It has to be written as a new one then replaced using rename(2).
Despite these it has huge benefits :
* Nil performance impact.
* Asynchronous (offline) deduplication can be scheduled off peak hours.
* usable in old-stable kernels
> * a nice and clean way. The kernel interface would need to be "hey kernel,
> I think the block in fd 1 offset 0 might be same as a block in fd 2 offset
> 4096, care to compare and perhaps combine them?".
So all the cleverness of *what* to merge would only happen in userspace ? What would be the impact of a runtime read ?
> Offline (a confusing name, it's a mounted filesystem but at a later time)
It can even be done asynchronously, by registering an inotify on it, and then queuing the dedups in a userspace daemon.
> See how much fun can we have with data structures?
> And the best of all, the kernel needs just a single syscall, with all the complexity done in userspace.
That's the whole beauty of it. You then create a whole ecosystem of softwares to address that complexity in every different manner possible :)
Now, the mixed approach I promised earlier :
As pure userspace take is not ideal, I was thinking about adding a FUSE in-place layer than would synchronously copy deduped-hardlinks on write. Could be triggered by a open(w) or a real write().
Else than that, just offer a raw, native, access to every other fileops.
Steve.