Re: Non-identical files with identical md5sums on Debian systems?
On Sun, Aug 04, 2013 at 10:21:09PM -0700, Russ Allbery wrote:
> Fabian Greffrath <firstname.lastname@example.org> writes:
> > I do occasionally check for identical files on different systems by
> > comparing their md5sums. So, just out of interest, could someone tell me
> > (how to find out) how many non-identical files with identical md5sums
> > there are there on a typical (say, amd64) Debian system?
> Unless you have a collection of MD5 collision attacks, or have installed a
> package that includes a sample MD5 collision, the changes are quite good
> that the answer is "zero." MD5 is no longer considered cryptographically
> strong, but that doesn't mean it's not a fairly random 128-bit hash. You
> need a *lot* of files before even the birthday paradox will give you much
> likelihood of an MD5 collision that wasn't intentionally constructed.
Let's assume every hard drive produced so far in human history is combined
in a single RAID0 array, and formatted using a typical filesystem without
an inode limit, then filled with small files. If my estimate is correct,
thanks to the birthday paradox there's around 0.001% chance there will be
at least one non-constructed MD5 collision.
Also, there is no known preimage attack against MD5; collision attacks are
quite less dangerous as the attacker would need to first give you a
legitimate version of the file she wants to replace.