I do occasionally check for identical files on different systems by comparing their md5sums. So, just out of interest, could someone tell me (how to find out) how many non-identical files with identical md5sums there are there on a typical (say, amd64) Debian system?
Assuming the output of md5 is random uncorrelated 128 bit binary numbersand making a couple of other approximations we can approximate the number with the formula.
((n*n-1)/2)/(2^128) Where n is the number of unique files on your system. I used the command cat /var/lib/dpkg/info/*.list | wc -l to get an approximation of the number of "debian files" on my main debian system with lots of stuff installed. I will assume all these files are unique. plugwash@debian:~$ cat /var/lib/dpkg/info/*.list | wc -l 304431 So the expected number of md5 collisions would be approximately ((304431*304430)/2)/(2^128) Plugging that into octave gives us an answer of octave:1> ((304431*304430)/2)/(2^128) ans = 1.3618e-28 The bottom line is under practical conditions the only way you are going to see two files with the same md5 is if someone went out of their way to create them and send them to you.