[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Debconf-discuss] btrfs deduplication / bedup users at DebConf13?



hi,

On Fri, Aug 16, 2013 at 10:33:54AM +0200, Stefano Zacchiroli wrote:
> On Thu, Aug 15, 2013 at 10:26:32PM -0300, Rogério Brito wrote:
> > If you only need to use this coarse deduplication, then take a look at
> > rdfind, instead of hardlink. Hardlink compares the files that are
> > likely to be the same (e.g., same size) byte by byte, while rdfind
> > uses hashes (md5 or sha1, at your option) to compare the files.
> 
> Right, I'm benchmarking this option as we speak. Given we already have
> hashes (SHA256 in this case) in sources.d.n, it would be cool for us if
> rdfind / hardlink / $your_tool_here can be fed external DB hashes. And
> none of the tools I've looked at seem to do that. I'll probably look
> into patching the one I'll end up choosing for that, but if you know of
> a similar tool that can use an external hash db, just shout!
> 
> And thanks for all the useful feedback people have poured into this
> thread.

Zack told me in person that he only cares about linking multiple
versions of the same package files only.

For this purpuse I would use rsync (shell output below):

-----------------------------------------------------------------------

cstamas@rohan2 /tmp/testdir-1 % find -type f | xargs md5sum
047209b3518ad40c2f9f413aa104e60f  ./tmp/test-v1/f1
9b92e11458772a8cbccb5078d1cf7746  ./tmp/test-v1/f2
3c5163369cfd2182ccaa830be92314ba  ./tmp/test-v1/f3
7cab5e10b29cf5856ff9cd07aa918854  ./tmp/test-v1/f4
047209b3518ad40c2f9f413aa104e60f  ./tmp/test-v2/f1
599fc96c588d52c227b7ab86f49e708a  ./tmp/test-v2/f3
7cab5e10b29cf5856ff9cd07aa918854  ./tmp/test-v2/f4
a21005d261eca27422a946d14b30e423  ./tmp/test-v2/f5
cstamas@rohan2 /tmp/testdir-1 % rsync -av tmp/test-v1/ test-v1
sending incremental file list                                 
created directory test-v1
./
f1
f2
f3
f4

sent 16652 bytes  received 91 bytes  33486.00 bytes/sec
total size is 16384  speedup is 0.98
cstamas@rohan2 /tmp/testdir-1 % rsync -av --link-dest=/tmp/testdir-1/test-v1 tmp/test-v2/ test-v2
sending incremental file list
created directory test-v2
./
f3
f5

sent 8375 bytes  received 54 bytes  16858.00 bytes/sec
total size is 16384  speedup is 1.94
cstamas@rohan2 /tmp/testdir-1 % l test-v1
total 16
-rw-r--r-- 2 cstamas cstamas 4096 Aug 16 12:03 f1
-rw-r--r-- 1 cstamas cstamas 4096 Aug 16 12:04 f2
-rw-r--r-- 1 cstamas cstamas 4096 Aug 16 12:04 f3
-rw-r--r-- 2 cstamas cstamas 4096 Aug 16 12:04 f4
cstamas@rohan2 /tmp/testdir-1 % l test-v2
total 16
-rw-r--r-- 2 cstamas cstamas 4096 Aug 16 12:03 f1
-rw-r--r-- 1 cstamas cstamas 4096 Aug 16 12:04 f3
-rw-r--r-- 2 cstamas cstamas 4096 Aug 16 12:04 f4
-rw-r--r-- 1 cstamas cstamas 4096 Aug 16 12:05 f5

-----------------------------------------------------------------------

As you can see this way you can save space by hardlinking the next
version to the previous. If that makes sense you can do this for
multiple (maybe all) versions (by specifying multiple --link-dest).

HTH.

Regards,
  cstamas
-- 
CSILLAG Tamas (cstamas) - http://cstamas.hu/


Reply to: