[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: JTE (Jigdo Template Export) v1.0

On Thu, Jun 10, 2004 at 12:16:23AM +0200, Richard Atterer wrote:
>On Wed, Jun 09, 2004 at 05:34:03PM +0100, Steve McIntyre wrote:
>> Unfortunately, I don't see us (quite) getting that far. To generate the
>> md5 of the full image file (which is kind of useful), we need to read all
>> of the data through anyway. You can't simply lump together multiple md5
>> chunks.
>Ah - indeed, you're right. :-/
>Hmmm. If the ability to create images this fast turns out to be a
>"must-have" feature one day, the template format could be changed: Either
>the image md5sum field could be left at zero to indicate "no md5sum
>available", or we could switch over to using what I call a "64 bit rsync
>sum" - a (cryptographically weak) checksum which allows "lumping together", 
>already used internally by jigdo.

Hmm. Md5sums are nice, and people are comfortable with them. We'll
need to be generating md5sums of the full ISO images anyway for that

In my original discussion with Phil, I was hoping that at some point
we could completely bypass a lot of the CD-building code and write
templates by *just* parsing the Packages and Sources files on a
mirror. That had the same problem. I think the quickest place to do a
lot of this is in mkisofs, which we're already tied to to actually
create the image. At least we can then optimise it so we stream
through the set of files just once.

>> That's a bug to do with large file support in libstdc++ IIRC? I've
>> noticed the problem myself with large jigdo images. I'm just testing JTE
>> v1.1 right now to make sure I don't have similar issues.
>There *is* currently a problem with big files for gcc 3.x, x<4. However,
>the issue Manty was referring to is something else:
>Currently, the algorithm which searches for matching files inside the image
>sometimes has to discard prospective matches in order to avoid becoming too
>slow (ie not O(n²) instead of O(n) time). I have an idea for a way to
>improve the accuracy, which should result in a smaller template size
>because more files are found in the image.

Ah, OK.

>> Absolutely. It'd be great to be able to get _lots_ of jigdo files created
>> for all the different options including multiple variants of CD and
>> different DVDs. There are still some more optimisations that should be
>> possible in the image-building stages; at the moment we end up md5summing
>> the entirety of the data on each disk several times, and that's a little
>> bit wasteful.
>If you're talking about the md5sums.txt files on the CDs, note that you can 
>take advantage of jigdo-file's cache when creating them:
>  jigdo-file md5sum --cache=x.db --hex FILES...

At the moment, I'm not actually using the cache file at all - I don't
have a need for it. Maybe we can optimise still further and add
md5-checking (and caching) into my JTE code. There are several places
where we read in all the files and/or generate checksums when doing a
full CD run at the moment:

mirror check
creating the md5sums.txt on each disk
create the iso image
jigdo template creation

It would be nice to be able to lose a couple of these passes, and it
should be possible.

Steve McIntyre, Cambridge, UK.                                steve@einval.com
Welcome my son, welcome to the machine.

Attachment: signature.asc
Description: Digital signature

Reply to: