[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#662080: ITP: hadori -- Hardlinks identical files



Hallo Julian Andres,

2012-03-04 um 12:31:39 schriebst Du:
> But in any case, avoiding yet another tool with the same security
> issues (CVE-2011-3632) and bugs (and more bugs) as we currently
> have would be a good idea.
> 
> hadori bugs:
>   - Race, possible data loss: Calls unlink() before link(). If
>     link() fails the data might be lost (best solution appears
>     to be to link to a temporary file in the target directory
>     and then rename to target name, making the replacement
>     atomic)

I copied that from ln -f, which has the same bug then.

>   - Error checking: Errors when opening files or reading
>     files are not checked (ifstream uses the failbit and
>     stuff).

If only one of the files fails nothing bad happens. If both fail bad things 
might happen, that's right.

> Common security issue, same as CVE-2011-3632 for Fedora's hardlink:
> 	[Unsafe operations on changing trees]
>   - If a regular file is replaced by a non-regular one before an
>     open() for reading, the program reads from a non-regular file
>   - A source file is replaced by one file with different owner
>     or permissions after the stat() and before the link()
>   - A component of the path is replaced by a symbolic link after
>     the initial stat()ing and readdir()ing. An attacker may use
>     that to write outside of the intented directory.
> 
> (Fixed in Fedora's hardlink, and my hardlink by adding a section
>  to the manual page stating that it is not safe to run the
>  program on changing trees).

I think that kind of bugs will stay until it is possible open/link by inode 
number. Perhaps *at() can help at the file currently examined.

Right now I only used it for my backups which are only accessible by me (and 
root).

> Possibly hardlink only bugs:
>    - Exaggeration of sizes. hardlink currently counts every
>      link replaced -st_size, even is st_nlink > 1. I don't
>      know what hadori does there.

hadori does not have statistics. They should be easy to add, but I had no use 
for them.

> You can also drop your race check. The tool is unsafe on
> changing trees anyway, so you don't need to check whether
> someone else deleted the file, especially if you're then
> linking to it anyway.

I wanted it to exit when something unexpected happens.

> I knew that there were problems on large trees in 2009, but got nowhere with
> a fix in Python. We still have the two passes in hardlink and thus need to
> keep all the files currently, as I did not carry the link-first mode over
> from my temporary C++ rewrite, as memory usage was not much different in my
> test case. But as my test case was just running on /, the whole thing may
> not be representative. If there are lots of duplicates, link-first can
> definitely help.
> 
> The one that works exactly as as you want is most likely Fedora's hardlink.

I've searched for other implementations and all the others do two passes when 
one is obviously enough.

> Yes. It looks readable, but also has far less features than hardlink (which
> were added to hardlink because of user requests).

I still don't get what --maximize (and --minimize) are needed for. In my 
incremental full backup scenario I get best results with keep-first. When 
hardlinking only $last and $dest (see below) even --maximize can disconnect 
files from older backups.

> > It
> > started with tree based map and multimap, now it uses the unordered_
> > (hash based) versions which made it twice as fast in a typical workload.
> 
> That's strange. In my (not published) C++ version of hardlink, unordered
> (multi) maps were only slightly faster than ordered ones. I then rewrote
> the code in C to make it more readable to the common DD who does not
> want to work with C++, and more portable.
> 
> And it does not seem correct if you spend so much time in the map, at
> least not without caching. And normally, you most likely do not have
> the tree(s) you're hardlinking on cached.

I have, because I usually run:
$ rsync -aH $source $dest --link-dest $last
$ hadori $last $dest


Grüße
Timo

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: