[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#662080: ITP: hadori -- Hardlinks identical files



On Sun, Mar 04, 2012 at 03:18:50AM +0100, Timo Weingärtner wrote:
> Hallo Julian Andres,
> 
> 2012-03-04 um 01:07:42 schriebst Du:
> > On Sun, Mar 04, 2012 at 12:31:16AM +0100, Timo Weingärtner wrote:
> 
> > > The initial comparison was with hardlink, which got OOM killed with a
> > > hundred backups of my home directory. Last night I compared it to duff
> > > and rdfind which would have happily linked files with different st_mtime
> > > and st_mode.
> > 
> > You might want to try hardlink 0.2~rc1. In any case, I don't think we need
> > yet another such tool in the archive. If you want that algorithm, we can
> > implement it in hardlink 0.2 using probably about 10 lines. I had that
> > locally and it works, so if you want it, we can add it and avoid the
> > need for one more hack in that space.
> 
> And why is lighttpd in the archive? Apache can do the same ...

Can it? Apache uses threads, lighttpd uses an event loop for
processing. That's a different topic though.

But in any case, avoiding yet another tool with the same security
issues (CVE-2011-3632) and bugs (and more bugs) as we currently
have would be a good idea.

hadori bugs:
  - Race, possible data loss: Calls unlink() before link(). If
    link() fails the data might be lost (best solution appears
    to be to link to a temporary file in the target directory
    and then rename to target name, making the replacement
    atomic)
  - Error checking: Errors when opening files or reading
    files are not checked (ifstream uses the failbit and
    stuff).

Common security issue, same as CVE-2011-3632 for Fedora's hardlink:
	[Unsafe operations on changing trees]
  - If a regular file is replaced by a non-regular one before an
    open() for reading, the program reads from a non-regular file
  - A source file is replaced by one file with different owner
    or permissions after the stat() and before the link()
  - A component of the path is replaced by a symbolic link after
    the initial stat()ing and readdir()ing. An attacker may use
    that to write outside of the intented directory.

(Fixed in Fedora's hardlink, and my hardlink by adding a section
 to the manual page stating that it is not safe to run the
 program on changing trees).

Possibly hardlink only bugs:
   - Exaggeration of sizes. hardlink currently counts every
     link replaced -st_size, even is st_nlink > 1. I don't
     know what hadori does there. (and it requires more work
     to fix that in hardlink, as we currently do not combine
     links to the same inode in any way, and likely want
     --dry-run to work correct as well).

You can also drop your race check. The tool is unsafe on
changing trees anyway, so you don't need to check whether
someone else deleted the file, especially if you're then
linking to it anyway.

> 	
> > hardlink 0.2 is written in C, and uses a binary tree to map
> > (dev_t, off_t) to a struct file which contains the stat information
> > plus name for linking. It requires two allocations per file, one for
> > the struct file with the filename, and one for the node in the tree
> > (well, actually we only need the node for the first file with a
> >  specific (dev_t, off_t) tuple). A node has 3 pointers.
> 
> The "hardlink" I used at that time was written in python and definitely didn't 
> do it the way I want.

I know. I knew that there were problems on large trees in 2009, but got
nowhere with a fix in Python. We still have the two passes in hardlink
and thus need to keep all the files currently, as I did not carry the
link-first mode over from my temporary C++ rewrite, as memory usage
was not much different in my test case. But as my test case was just
running on /, the whole thing may not be representative. If there
are lots of duplicates, link-first can definitely help.

The one that works exactly as as you want is most likely Fedora's hardlink.

> 
> hadori is written in C++11 which IMHO makes it look a little more readable. 

Yes. It looks readable, but also has far less features than hardlink (which
were added to hardlink because of user requests). You can get it even shorter
and more efficient if you use POSIX's nftw() to walk the directory tree
instead of your recurse()/recurse_start() pair (you just need to write
one callback that takes a complete file path, a type info, a stat buffer,
and the basename of the file). Does not work on Haiku though, or BSDs
prior to 2002. Alternatively, there's also fts() if you don't care about
the System V line of UNIXes (e.g. Solaris).

You can tell both to not cross mount points and do not follow symbolic
links, which makes the whole thing very easy.

> It 
> started with tree based map and multimap, now it uses the unordered_ (hash 
> based) versions which made it twice as fast in a typical workload.

That's strange. In my (not published) C++ version of hardlink, unordered
(multi) maps were only slightly faster than ordered ones. I then rewrote
the code in C to make it more readable to the common DD who does not
want to work with C++, and more portable.

And it does not seem correct if you spend so much time in the map, at
least not without caching. And normally, you most likely do not have
the tree(s) you're hardlinking on cached.

> 
> The main logic is in hadori.C, handle_file and uses:
> 
> std::unordered_map<ino_t, inode const> kept;
> std::unordered_map<ino_t, ino_t> to_link;
> std::unordered_multimap<off_t, ino_t> sizes;
> 
> class inode contains a struct stat, a file name and an adler checksum, but I 
> plan to drop the last one because I think the hashing option is no great gain.

I basically have the equivalent to a multimap<off_t, inode> in my code. For
hashing, you may want to use crc32 instead of adler32. Adler32 likes to have
collisions (mostly on small sizes, though, but still, far more collisions
than CRC32). Performance is not really different, you'll spend far more
time in read()ing anyway.


-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.

Attachment: pgpOf_F8y9cl2.pgp
Description: PGP signature


Reply to: