Re: Checksums on ftp
On Wed, Apr 26, 2000 at 07:05:18PM -0000, Sascha_M._Silbe@progbbs.dyn.ns1.net wrote:
> 26 Apr 00 12:45, Alexander Hvostov wrote to UUCP:
> AH> Yeah, yeah, you just try and break an MD5 checksum anytime this
> AH> year. *cough*
> It'll take some time, but it's possible. A simple brute-force attack will do
> the job. And "some time" depends VERY much on the hardware the cracker owns. It
> may be enough to be secure against your friends, but not against a "real"
It still only scales linearly with the speed of hardware available. You
would have to have more funding than the NSA to get that much hardware! Is
there any reason to believe it could be done in less than the age of the
universe (~1e10 years) for 128bit MD5?
Read RFC 1321 and tell me what you think then. :)
You have to make on the order of 2^128 guesses to come up with a message
that has the same hash as the file you're trying to spoof. (you don't get
the advantage of the "birthday paradox" (29 people in a room -> 50% chance
at least one pair has the same birthday) because the other member of the
pair is already picked: it is the md5 hash of the original file.
$ bc -l
2^128 / 10000000000 / 365 / 24 / 60 /60
so, you'd have to perform 10^21 md5 hashes per second to get as many hashes
as there are possible hashes. (you wouldn't exactly "exhaust the key
space", since the "key" is the message you are hashing, and it is _much_
longer than 128 bits.) Since MD5 works on chunks of 64 bytes, you could
mess with the last few bytes only and not have to run MD5 on the whole file
every time, just the last 64-padlength bytes and the pad bytes. (You would
pre-compute the first bit of the file.) This lowers the time required to
check a variation of each computation a lot (especially for long files), but
not enough. Also, if you try to exhaust the (effectively 64byte) keyspace,
you won't have any luck at all! That's 2^512 different possibilities!)
On my P200MMX, md5sum takes at best 0.85sec on a 12724349 byte MP3 file.
To md5 transform one 64byte chunk takes
12724349 / 64 + 1
0.85 / .
i.e. 4 microseconds = 4*10^-6 ~= 2^(-18) (10^3 ~= 2^10, coincidentally)
Thus, on my computer, assuming you could generate a test m5dsum in the time
it takes to do a single md5 block, it would take 2^(128-18)=2^110 seconds to
have a pretty good chance of stumbling upon a bit pattern that worked.
That's a long time:
2^110 / 365 / 24/ 60 / 60
10^25 years is a damn long time. The universe is only on the order of 10
billion years old! RC5 is a similar algorithm to MD5 (designed by the same
guys, RSA), consisting of rotates and bit shifts, so, given that I get 430
kkeys/s on my computer, and that www.distributed.net reports 147 Gkeys/s for
RC5, if we had every computer that's running RC5 trying to find a hash
collision for a given hash,
41161663325523430591470829.6012501268 * 430/147 / 1000000
That's still 10^20 years, which is a damn long time. You would be hard
pressed to find enough computing power to do this, no matter who you are.
(even the NSA couldn't do it in a year. I'm sure they don't have 20 orders
of magnitude more computing power than d.net. Also, MD5 is somewhat
complicated, so you would have a hard time building dedicated hardware much
faster than a general purpose CPU.) (BTW, if you don't know about
distributed computing, check out www.distributed.net, and www.dcypher.net.
dcypher's got a real-world useful project: simulating gamma rays to help
figure out how to build better nuclear waste containers. Too bad they only
run on x86: linux, *bsd, or windoze.)
Even if you did find a way to change a file and preserve its MD5 digest,
you might have a hard time making use of that. Assume you've made some
changes to a file, and then you want to adjust it back to the original MD5
digest. You would have to find a block of data that you could set
arbitrarily, otherwise changing the bytes to make the digest right would
make the file invalid. If you did this with a shell script, you could mess
with a comment longer than 64 bytes (as long as the bytes didn't turn out to
include \n, which would end the comment, wasting the billions of
universe-ages of CPU time you spent calculating it.) For a binary file, you
would have to make the whole rest of the file smaller so you could add 64
bytes somewhere that got ignored. (if you change the length of the file,
that will get noticed along with the md5sum, most likely!)
Some people have made some progress in cryptanalysis of md5sum, but I
haven't heard about this affecting the likelyhood of finding collisions for
a given hash value. If this is not the case, (i.e. collisions are a lot
more likely than 1 in 2^128, or for some other reasone it takes orders of
magnitude less than 2^128 guesses before you find a message with the same
digest as some other _already specified_ message), I would really like to
hear about it.
BTW, what makes you think "real crackers" have super fast hardware at their
disposal? Do you think they cracked into NASA or the NSA or something?
AFAIK, most crackers are just badasses with an internet connection and a PC.
(almost always x86, but I suppose some have the sense to get a Mac and run
some kind of Unix on it.) (I think _mostly_ only script kiddies try "hack"
from non-Unix machines. Sillies :)
Besides, I'm almost certain that no system cracker would bother to get the
md5 digests the same on all the files they changed, since most people don't
check. I'd say you would be able to find changed files > 99% of the time,
and either you wouldn't find any changed, which would mean a _very_
sophisticated cracker, or you would find every file she changed. (the
chance of one changed file randomly staying unchanged is 1/(2^128))
Happy hacking :)
#define X(x,y) x##y
Peter Cordes ; e-mail: X(email@example.com. , ns.ca)
"The gods confound the man who first found out how to distinguish the hours!
Confound him, too, who in this place set up a sundial, to cut and hack
my day so wretchedly into small pieces!" -- Plautus, 200 BCE