[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

re: Non-identical files with identical md5sums on Debian systems?



I do occasionally check for identical files on different systems by
comparing their md5sums. So, just out of interest, could someone tell me
(how to find out) how many non-identical files with identical md5sums
there are there on a typical (say, amd64) Debian system?
Assuming the output of md5 is random uncorrelated 128 bit binary numbers
and making a couple of other approximations we can approximate the number with the formula.

((n*n-1)/2)/(2^128)

Where n is the number of unique files on your system.

I used the command  cat /var/lib/dpkg/info/*.list | wc -l to get an
approximation of the number of "debian files" on my main debian
system with lots of stuff installed. I will assume all these files
are unique.

plugwash@debian:~$ cat /var/lib/dpkg/info/*.list | wc -l
304431

So the expected number of md5 collisions would be approximately

((304431*304430)/2)/(2^128)

Plugging that into octave gives us an answer of

octave:1> ((304431*304430)/2)/(2^128)
ans =  1.3618e-28

The bottom line is under practical conditions the only way you
are going to see two files with the same md5 is if someone went
out of their way to create them and send them to you.



Reply to: