Re: Still busy...

To: debian-cd@lists.debian.org
Subject: Re: Still busy...
From: Richard Atterer <deb-cd@list.atterer.net>
Date: Sun, 7 Jan 2001 01:27:31 +0100
Message-id: <[🔎] 20010107012731.B15422@atterer.net>
Mail-followup-to: debian-cd@lists.debian.org
In-reply-to: <[🔎] Pine.LNX.3.96.1010106210816.26834A-100000@panic.et.tudelft.nl>; from costar@panic.et.tudelft.nl on Sat, Jan 06, 2001 at 09:24:49PM +0100
References: <[🔎] 20010105231705.A8086@atterer.net> <[🔎] Pine.LNX.3.96.1010106210816.26834A-100000@panic.et.tudelft.nl>

Hi Anne,

On Sat, Jan 06, 2001 at 09:24:49PM +0100, J.A. Bezemer wrote:
> One idea that might be useful: You probably cut each file in X-byte
> blocks and compare checksums.

I'm doing better than that: The tool uses something similar to rsync's
algorithm to find the files at *any* byte offset! 8-]

The reason why I allow this (respectively, invest more work into the
more complicated algorithm) is that it permits the system to be used
in many more ways than just for CD images. Some of the other
applications:

- DVD images with a UFS(?) filesystem on them
- Huge files of some other kind (e.g. staroffice.bin.gz, a whopping
  93MB;-) that have been given to the "split" command
- "zip -0" files

> Since we're talking about terribly much files that we don't want to
> checksum more than once, it may prove worthwile to scan all files at
> once and save the results in a temporary file (checksum <->
> filename/offset; sorted?) and use that to compare against.

This occurred to me, too. Due to the special quirks of how it works,
it needs to save the following for each file:

- Rolling checksum of the first bytes (a 64-bit extended, more
  secure version of rsync's checksum)
- MD5sums of fixed-size chunks that make up the file
- MD5sum of the whole file

These "MD5sums of fixed-size chunks" are necessary because I put some
effort into ensuring that the image file can be fed into stdin - that
way, you can directly pipe from mkhybrid into it, which should come in
very handy.

> Oh, and you probably already know that files in an iso image 1) can
> only start at 2048-byte multiples

I don't care. ;-)

> and 2) are always contiguous (no fragmentation).

I'm relying on this.

All the best,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer
  | \/¯|  http://atterer.net
  ¯ ´` ¯

Reply to:

References:
- Still busy...
  - From: Richard Atterer <deb-cd@list.atterer.net>
- Re: Still busy...
  - From: "J.A. Bezemer" <costar@panic.et.tudelft.nl>

Prev by Date: Re: Potato and Woody CD builds
Next by Date: Woody Packages files etc.
Previous by thread: Re: Still busy...
Next by thread: Woody Packages files etc.
Index(es):
- Date
- Thread