[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

JTE (Jigdo Template Export) v1.0



Guys (especially Richard),

I've been looking for a while at ways to make jigdo run faster when
generating template files from iso images. (See
http://lists.debian.org/debian-cd/2003/12/msg00041.html for my
original mail). The core problem is that once we have an ISO image,
jigdo essentially has to brute-force that image back into a binary
blob (template file) and a list of files needed to rebuild the image
(jigdo file). On my home system with a very good disks and a
reasonable processor, creating jigdo files for a DVD iso image can
take several hours. Multiply that up by 11 architectures...

There are a few ways to improve this that I can see:

1. Modify jigdo so it knows about the internals of ISO images and can
   efficiently scan them (bad, not very generic for jigdo)

2. Write a helper tool to dump extra information for jigdo to use
   alongside the ISO image (helper tool written, but modifying jigdo
   to use this looks HARD)

3. Patch mkisofs to write .jigdo and .template files alongside the
   ISO image

I've now done #3, and the patch for mkisofs is at

  http://www.einval.com/~steve/software/CD/mkisofs-JTE.patch.gz

In the same directory I have a tool to dump the contents of (and
rebuild images from) .jte files and another one to dump the contents
of .template files.

How to use it:
==============

To use this code, specify the location of the output .jigdo, .template
and .jte files alongside the ISO image. The .jte file is an
intermediate helper file that I'll probably lose for the next
release. You can also specify the minimum size beneath which files
will just be dropped into the binary template file data rather than
listed as separate files to be found on the mirror. For example:

mkisofs -J -r -o /home/steve/test1.iso \
        -jigdo-helper /home/steve/test1.jte \
        -jigdo-jigdo /home/steve/test1.jigdo \
        -jigdo-template /home/steve/test1.template \
        -jigdo-min-file-size 16384 \
        /mirror/jigdo-test

If the -jigdo-* options are not used, the normal mkisofs execution
path is not affected. The above invocation will create 4 output
files. I've tested extensively with various input data and I can
recreate ISO images using jigdo-file and the wrapper jigdo-mirror.

How it works:
=============

I've hooked all the places in mkisofs where it will normally write
image data. All the normal data write calls (dir entries etc.) I
simply pass through and build into the template file. Any *file* data
entries are passed through with information about the original
file. If that file is large enough, I grab the filename and the MD5 of
the file's data so I can just write a file match record into the
template file (and then the jigdo file).

How fast is it?
===============

On my *laptop* (600MHz P3, slow laptop disk) I can make a template
file in parallel with the ISO image from a typical 500MB data set in
about 2 minutes. By simply not creating the ISO (-o /dev/null), this
time halves again. The data set I'm using here is a copy of the woody
i386 r2 update CD, as it's a handy image I had lying around.

What's left to do?
==================

1. Testing! :-) This is where you lot come in! Please play with this
   some more and let me know if you have any problems, especially with
   data corruption.

More features:

2. Add support for -jigdo-exclude option(s), so that we can exclude
   (from the jigdo) README.* etc and other files that go on Debian CDs
   but often change on the mirrors. Reasonably easy to do, and I'm
   playing with this now.

3. Add pattern-matching in the .jigdo file (e.g. /mirror/debian ->
   Debian:). Again, should be easy.

4. Cosmetic cleanup of the .jigdo output. Easy

5. MUCH harder: re-reading and re-encoding .iso images that have been
   modified since they were first written. This is necessary for
   the boot code used on several architectures in debian-cd. I see how
   to do it - basically diff the image on disk to the one we would
   recreate from the .template file and write a new template file to
   match that. It's going to take some work...

I hope people find this useful - at the moment I shudder at the
thought of releasing sarge (10+ CDs, netinst, business card, 2 DVDs
per arch) without making this kind of change. It'll take a week to
generate the release images otherwise...

-- 
Steve McIntyre, Cambridge, UK.                                steve@einval.com
"It's actually quite entertaining to watch ag129 prop his foot up on
 the desk so he can get a better aim."          [ seen in ucam.chat ]

Attachment: signature.asc
Description: Digital signature


Reply to: