[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: sha1summ of complete directory?



In <[🔎] 20090716151953.GE4628@wks0082.feds.uwaterloo.ca>, Eric Gerlach wrote:
>On Wed, Jul 15, 2009 at 07:36:24AM -0700, Todd A. Jacobs wrote:
>> On Mon, Jul 06, 2009 at 07:30:19PM -0500, Ron Johnson wrote:
>> > How would one go about computing a *single* hash value for a complete
>> > directory tree?
>>
>> You might want to look at how git does this. As I understand it, git
>> stores hashes of trees, so the implementation may help you.
>
>Not really... the hash git indexes with is that of the compressed object
> (which is either a blob, tree, or commit).

Actually, I'm fairly sure it hashes the uncompressed object (now[1]), but 
I'd have to dig in to the source code to be sure.

> Tree and commit objects point
> at other objects (which are also stored by hash).  Blobs are the files
> themselves.

That is one way of calculating a single hash for a complete directory tree.  
The tree is identified by it's hash, which verifies the contents.  The 
contents identify the "pointed to" objects by hash, which verifies their 
contents.  Etc.

The hash/sum calculated has the same verification properties as a single-
file data-only hash.  It *might* not be as cryptographically strong, but 
that would be a bit surprising and I've seen no papers/pages verifying or 
refuting it's strength.[2]
-- 
Boyd Stephen Smith Jr.           	 ,= ,-_-. =.
bss@iguanasuicide.net            	((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy 	 `-'(. .)`-'
http://iguanasuicide.net/        	     \_/

[1] There was a small period of time during Linus's maintainership of git 
that it hashed differently than it does now.  I can't recall why or when it 
was changed.

[2] Other than the fact that it uses a 128-bit SHA-1 hash and that *may* be 
getting too weak to be considered cryptographically secure in the near 
future.  Using SHA-2 is probably better, and you shouldn't lose much 
strength by truncating at 128-bits if you need that size specifically, but 
git doesn't support that.  Hopefully SHA-3 will be out before it matters, 
which means git can switch to that.[3]

[3] If they ever decide to switch, it will probably be painful.  They might 
not ever switch, since I don't think that resistance against attackers was 
the intent, just "identification" and resistance to random corruption.  (CVS 
and SVN could be silently corrupted for years and it was virtually 
impossible to tell; that doesn't happen to git repositories.)

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: