[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Warning Linux Mint Website Hacked and ISOs replaced with Backdoored Operating System



Hi,

Henrique de Moraes Holschuh wrote:
> MD5 alone can be somewhat dangerous even in benevolent environments: if the
> data sets are large enough or you are just unlucky,

The size of the data set does not matter much.
As already stated, there is the Pidgeon Hole Principle, which tells
us that a 128 bit hashsum cannot be the mapping result of all inputs
of 129 bit without having 1 exp 128 collisions. Roughly each MD5
will appear twice among the messages of 129 bit length.

Since MD5 is computed bytewise, you will get about 256 times
each possible MD5 value from the set of all 17 byte inputs.


Nevertheless the size of 128 is considered sufficient for UUID.
  https://en.wikipedia.org/wiki/Universally_unique_identifier
says:

  A UUID is simply a 128-bit value. [...]
  The intent of UUIDs is to enable distributed systems to uniquely
  identify information without significant central coordination.
  In this context the word unique should be taken to mean
  "practically unique" rather than "guaranteed unique".

then comes some math
  https://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates

  "only after generating 1 billion UUIDs every second for the next 100 years,
   the probability of creating just one duplicate would be about 50%."

The only "practical" objection against MD5 in this context would be
that we first have to prove its quality of uniformly distributing the
results over the space of 2 exp 128 possible values.
  https://en.wikipedia.org/wiki/Hash_function#Uniformity

I am not aware that MD5 was accused of not having enough of this property.
This also yields the "practical" randomness which UUID presumes.
I understand the table in
  http://michiel.buddingh.eu/distribution-of-hash-values
that MD5 has denser and sparser regions, but still is quite well uniform.


As already stated too, the Birthday Paradox hits when you create a large
collection of MD5 and add more. You have to expect the first collision
after about 2 exp 64 values. (That is about the probability to get stomped
into the earth by a 10 km asteroid until tomorrow evening. So take this
as benchmark whether you should be worried about MD5 collisions.)

I also have read a good argument in this thread about the lifetime
of contemporary hardware. It gives you few chances to process 2 exp 64
data sets of 16 bytes (256 billion gigabytes).


Have a nice day :)

Thomas


Reply to: