Re: Still busy...
On Sat, Jan 06, 2001 at 09:24:49PM +0100, J.A. Bezemer wrote:
> One idea that might be useful: You probably cut each file in X-byte
> blocks and compare checksums.
I'm doing better than that: The tool uses something similar to rsync's
algorithm to find the files at *any* byte offset! 8-]
The reason why I allow this (respectively, invest more work into the
more complicated algorithm) is that it permits the system to be used
in many more ways than just for CD images. Some of the other
- DVD images with a UFS(?) filesystem on them
- Huge files of some other kind (e.g. staroffice.bin.gz, a whopping
93MB;-) that have been given to the "split" command
- "zip -0" files
> Since we're talking about terribly much files that we don't want to
> checksum more than once, it may prove worthwile to scan all files at
> once and save the results in a temporary file (checksum <->
> filename/offset; sorted?) and use that to compare against.
This occurred to me, too. Due to the special quirks of how it works,
it needs to save the following for each file:
- Rolling checksum of the first bytes (a 64-bit extended, more
secure version of rsync's checksum)
- MD5sums of fixed-size chunks that make up the file
- MD5sum of the whole file
These "MD5sums of fixed-size chunks" are necessary because I put some
effort into ensuring that the image file can be fed into stdin - that
way, you can directly pipe from mkhybrid into it, which should come in
> Oh, and you probably already know that files in an iso image 1) can
> only start at 2048-byte multiples
I don't care. ;-)
> and 2) are always contiguous (no fragmentation).
I'm relying on this.
All the best,
|_) /| Richard Atterer
| \/¯| http://atterer.net
¯ ´` ¯