[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [RFC] Enhance checksum support



Frank Lichtenheld wrote:
On Fri, Jan 18, 2008 at 11:38:55PM +1000, Anthony Towns wrote:
It'd actually be good to be able to break Files in future, so that we're
forced to verify something other than md5sum. Otherwise there will
be code that doesn't check it properly, and that will end up being a
security problem.

Hmm, that might indeed be a good idea (the point to remove the Files
field would be v3 then).

Having it be:

  Contents: sha256
   28ee6a10eb280ede4b19c1b975aff5533016a26de67ba9212d51ffaea020ce34 355 foo
  Files:
   4bf7ff17bd9ddf3846d9065b3c594fb4 355 foo

or similar would be nice and non-redundant, and make it possible to drop

I can see the "nice". But once I want to include more than one checksum
it quickly gets redundant.

So maybe keep the Checksums field and introduce a Contents field that
contains no checksums, but only the size and the name?

Checksums:
  md5 4bf7ff17bd9ddf3846d9065b3c594fb4 foo
  sha256 28ee6a10eb280ede4b19c1b975aff5533016a26de67ba9212d51ffaea020ce34 foo
Contents:
  355 foo
Files:
  4bf7ff17bd9ddf3846d9065b3c594fb4 355 foo

That makes the parsing more robust and eliminates the need to specifiy
the size of a file more than once. If we want we could even declare size
also to be a checksum and include only the filenames in the
Contents field.

Gruesse,
Isn't sha256 a little much for a file of this size? Would it be worth using a smaller hash for smaller files? With both lines you are storing 122 bytes to uniquely identify a 355 byte file named foo. If you really need multiple checksums, why not do something more of the type:

Checksums: sha1 sha256 sha_N
 - {sha256} - foo
 {sha1} - - bar
Files:
 {md5} 355 foo
 {md5} 10 bar
 {md5} 1 baz

You wast less space identifying the hash and it is still easy to parse. I assume the Files section can not break and requires the "md5 size name" format for older/unsupported tools.


Reply to: