Re: Warning Linux Mint Website Hacked and ISOs replaced with	Backdoored Operating System
On Tue 23 Feb 2016 at 16:58:38 (+0100), Nicolas George wrote:
> Le quintidi 5 ventôse, an CCXXIV, David Wright a écrit :
> > 1) I do what fdupes does, ie identify files (in a benevolent
> >    environment) using the MD5 signature to detect duplicate
> >    contents.
> 
> You did not specify the average size of files nor how sure you want to be.
Just the usual mix of normal user files. Nothing specialised here.
> If the files are large, I would suggest to use a sparse hash function, i.e.
> a hash function that only reads small parts of the file, and do a full
> comparison or compute a strong hash only for files that have a collision on
> that.
After eliminating uniquely-sized files, I do check the first chunk of
files with identical lengths before I hash them. I think the following
methods are the only ones easily available.
> > >>> hashlib.algorithms_guaranteed
> > {'md5', 'sha1', 'sha224', 'sha512', 'sha384', 'sha256'}
> > >>> hashlib.algorithms_available
> > {'MD4', 'md5', 'md4', 'sha1', 'MD5', 'dsaWithSHA', 'whirlpool', 'sha',
> > 'SHA512', 'SHA256', 'ripemd160', 'sha512', 'SHA384', 'sha384',
> > 'dsaEncryption', 'RIPEMD160', 'sha256', 'SHA224', 'SHA1',
> > 'ecdsa-with-SHA1', 'DSA', 'SHA', 'sha224', 'DSA-SHA'}
> 
> These are all cryptographic hash functions: too strong for a preliminary
> test, insufficient for absolute certainty.
Good enough for me (apart from the specially-crafted pair of letters
from Julius Caesar).
> Still, you can easily benchmark.
Well, md5 beats md4 and sha1, so I guess I'll stick with that for the
time being.
Cheers,
David.
Reply to: