[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Checksumming tool



ma, 2005-11-28 kello 19:07 +1000, Anthony Towns kirjoitti:
> Hrm, if we're writing our own thing, maybe we should do it properly:
> have a single program that can do multiple hash algorithms, have the
> default hash be secure, and update it in future, and so on.

As it happens, I've been wanting a really nice checksum program for a
couple of years now. When I burn a CD or DVD with files, I put a
checksum file ("md5sum.txt" usually) at the root, so that I can easily
check that the disk is still working years later. It would be nice to
not have a zillion different, incompatible checksum tools.

The md5sum program isn't very user friendly and my main motivation has
been for more usability (feedback of how long the check will still take,
and stuff like that). I have, however, also thought about other checksum
algorithms than MD5, and about a format that is extensible enough that
it won't need to be changed every time the algorithm changes. A few
thoughts:

        1. Definitely use URL encoding for filenames. It's cheap, well
        known, and usually not needed (% being a nicely rare character),
        and sometimes it really is important to be able to deal with
        pathnames with weird characters.
        
        2. A little bit of verbosity doesn't add very much to the file
        size, and will make dealing with the files much easier.
        
        3. Who knows what else one might want in the file later.
        
        4. md5sum and sha1sum compatibility is pretty much required for
        the new tool. It makes the transition tolerable. I don't feel
        new files need to be backwards compatible by default, however.
        
The best I've come up with so far is a pseudo rfc822 syntax:

        File: foo%20bar/hellurei.txt
        Size: 12345
        MD5: 012345667
        SHA-256: 0a0a0a0a0a0a0a0a0a0a0a0a
        Mode: 0644

Empty lines separate blocks of "headers" for different files. This
should all be very simple to use and immediately logical and familiar to
anyone who'se seen e-mail headers.

It would be cool for file(1) or GNOME's and KDE's MIME type heuristics
to easily recognize the format, so that (eventually) a GUI tool can be
written to deal with such files.

I put an old draft of the manual page for my work-in-progress tool at
http://liw.iki.fi/liw/temp/summain.txt and the bzr (a.k.a. bazaar-ng)
repository at http://liw.iki.fi/liw/bzr/ in case anyone is interested. I
don't have all that much code (this being a project I hack on whenever I
don't have anything useful to do, like reading Debian mailing lists). It
tends to get rewritten every now and then (happiness is going NIH on
your own code). It shouldn't take that much effort to write the tool, so
most of my efforts have gone into thinking about the exactly correct
command line user interface features, and about the prettiest
implementation design. 

I also write it in Python, and a pure C version would probably be
preferable for Debian's purposes. The file format is more important at
this point, though, and any sensible file format should be quite simple
to support in any language.

-- 
The most difficult thing in programming is to be simple and
straightforward.



Reply to: