[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

First release of Jigsaw Download - distributed download & on-the-fly assembly of CD images



Hello all,

Yes, it finally happened! Last December, after a discussion on this
list about a new CD download concept, I went off and started coding. 
Somehow, the program became much bigger than I expected, but right
now, at 7400 lines of code, it is becoming usable, so I'm releasing it
for the first time.

The C++ sources of jigdo 0.5.0 are now available from
  <http://atterer.net/debian/>

Download, unpack, "./configure", then "make" or "make deb".

Quick start:
  mount /cdrom
  (find /cdrom -name '*.deb'; echo; dd if=/dev/cdrom) \
  | ./jigdo-file make-template --image=- --template=cd.template --files-from=-
______________________________________________________________________

Some of the following info I already posted last December, but I doubt
a lot of people remember:

The basic idea of jigdo is the same as that of the current pseudo
image kit (PIK): When you want to distribute a Debian CD, store the
.deb packages on FTP servers, then let people download them
individually and reassemble the image by "filling in the blanks", i.e. 
adding the directory information, README files etc.

The PIK uses rsync for this second step. jigdo uses what I call a
"template" file instead, a kind of binary diff which contains the dir
info and other data, along with a couple of checksums. The template
can be distributed via the normal Debian FTP mirror network.

This first release of jigdo contains the command-line tool
"jigdo-file", which you can use to generate the template from the
image (which could be an ISO9660 image, a UDF image, a big tar file... 
- the program doesn't care!) by giving it the generated image and a
list of filenames for files which might be contained in contiguous
regions of the image.

Furthermore, I have plans for a GUI user application which cannot
generate the templates, but which provides a very convenient way of
downloading and *assembling* the images, including automatic search
for the best server, maybe concurrent downloads from several servers,
scanning of previous Debian releases' CDs so you only need to download
what's new, and, above all, on-the-fly assembly, i.e. you don't need
1300MB disc space to create a 650MB CD image. This tool should run on
Windows, too - I hope I will achieve this by using GTK+.

BTW, a nice property of the scheme is that it is easy for anyone to
create a customized CD image - they only need to provide web space for
their customized packages/whatever, not for the whole image.

I NEED HELP!
It would be great if someone were able to start work on the GUI tool -
otherwise, there will certainly be none for woody's release. 
Additionally, I have practically no coding experience under Windows. 
Furthermore, I will definitely *not* be able to add support for jigdo
to debian-cd, nor the pseudo image kit, simply because I haven't got
the disc space for a local Debian mirror. Anne Bezemer mentioned at
one point she'd be willing to fix the PIK - that would be really nice,
Anne!
______________________________________________________________________

This is how it works in detail:

--- 1 ---

Someone uses debian-cd to create an ISO image. Then, they use
jigdo-file to generate the template file, e.g. like this:

  jigdo-file make-template --image=woody.iso \
          --template=woody.template *.deb

All the .deb files are checked - if they are in the image, they no
longer appear in the template data, instead they're just referenced
from it. jigdo-file also supports pipes, so you needn't even store the
big image on disc, but can pipe it directly into the program (this was
a bit more difficult to implement than it might seem to you!):

  mkisofs ... \
  | jigdo-file make-template --image=- \
          --template=woody.template *.deb

You can even pipe *both* filenames and the image into it:

  (find . -name '*.deb'; echo; mkisofs ...) \
  | jigdo-file make-template --image=- \
          --template=woody.template --files-from=-

The template file is now created. It is a binary file that contains
entries of the form:

- "at this point, insert x bytes of data"
  (the data to be inserted is also in the template)
- "at this point, insert x bytes of external file data which have an
  md5sum of y" (NB: The file _name_ is not recorded here)
- (once at the end) length and checksum of complete image

The entire template data is compressed with zlib. jigdo-file is quite
demanding in CPU power and I/O - use a fast machine! :)

--- 2 ---

A second, human-readable ".jigdo" file is created along with the
template. Apart from an URL for the above file, it contains
information about which checksums map to which filenames, and a list
of mirrors. This is what it might look like (exact file format isn't
set in stone yet):

    [Jigdo]
    Version: 1.0
    Generator: jigdo-file/0.5.0

    [Image]
    Description: Debian GNU/Linux 2.2 r35 _Potato_ - ...
    Template: ftp://ftp.debian.org/pub/debian/cd-images/2.2r35.template
    Filename: binary-i386-1.iso
    Hash: QrxELOWvjQ2ROwI6NSQxGA # MD5sum, quoted-printable encoded

    [Servers]
    debian: ftp://ftp.leo.org/pub/comp/os/unix/linux/Debian/debian/
    nonUS: http://some.mirror.net/debian-nonUS/
    nonUS: ftp://ftp.debian.org/pub/debian/non-US/

    [Parts]
    # Either indirection through mirrored "server dir":
    NBtLU+0yWENK7z65ZV6Ytw: debian:potato/r35/Contents.i386.gz
    Fx8llO3m4xaGK40AiKwa5Q: nonUS:main/binary-i386/ssh_1.2.3_i386.deb
    # ...or directly insert mirrored URL:
    2DrdnXVSxgwWcls+SaayRQ: http://f.net/debian-nonUS/myssh/ssh_4.2_i386.deb
    # 2 different alternatives for same file (checksum is the same):
    NBtLU+0yWENK7z65ZV6Ytw: directoryA:foo-0.1.tar.gz
    NBtLU+0yWENK7z65ZV6Ytw: directoryB:foo-0.1.tgz

An initial version of the [Parts] section is also output by
jigdo-file. I think some #include mechanism will be useful at one
point for including the list of Debian mirrors.


Aside: If you look e.g. at the line

    Fx8llO3m4xaGK40AiKwa5Q: nonUS:main/binary-i386/ssh_1.2.3_i386.deb

you'll notice that there exists a "server-relative" path, which is
"main/binary-i386/ssh_1.2.3_i386.deb" in this case. The way to tell
jigdo-file where to split the absolute path on your harddisk is to use
a double slash, like this:

  jigdo-file make-template --image=... --template=... etc. \
          /opt/mirrors/debian//main/binary-i386/ssh_1.2.3_i386.deb
                             ^^
jigdo-file automatically recurses into directories, so you could also
use:

  jigdo-file make-template --image=... --template=... etc. \
          /opt/mirrors/debian//


The jigdo file is also gzipped and put on the Debian server. The way I
imagine it, the GUI program will register to be called whenever the
user clicks on a link to a .jigdo file in their web browser. Using the
.jigdo file, the remaining info and data can be downloaded.

--- 3 ---

Anyone wishing to download the CD image only needs to tell the
yet-to-be-written GUI tool the location of the ".jigdo" file. Using
some heuristics to choose one or more servers to download from, it can
fetch the parts and assemble the ISO image without further
interaction.

Currently, jigdo-file supports a *very* basic form of this, without
any download facility:

  jigdo-file make-image --image=foo.iso --template=foo.template *.deb

This will scan all the .deb's first to get their checksum, and then
reassembles the image. It already does a lot of internal checking, but
you can also check the entire generated image at the end, with:

  jigdo-file verify --image=foo.iso --template=foo.template


Features that will appear in the immediate future, i.e. next versions:

- With make-image, need not supply all required files and create the
  image with just one jigdo-file invocation, but can merge in more and
  more files with >1 invocations until the image is complete. This is
  intended to be used for re-using old .deb's from previous Debian
  releases, e.g. from a couple of CDs.
- A cache for the file checksums, to save you the time of it scanning
  your whole local Debian mirror over and over again.
- Some control of the [Parts] section output by jigdo-file, i.e. you
  can assign labels like "debian" or "non-US".

Gotchas with this release:

- Some stuff not implemented, but the examples above work.
- Seems not to work on my ARM box - still need to explore.
- Doesn't link with gcc-3.0 - search me why!
- /Should/ work with >4GB files/images, but never tested.
- Test cases #768 and many afterwards fail - see the "torture" crashme
  program - this is a problem with torture only, not with jigdo-file.

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer     |  CS student at the Technische  |  GnuPG key:
  | \/¯|  http://atterer.net  |  Universität München, Germany  |  0x888354F7
  ¯ ´` ¯
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.5 (GNU/Linux)

iD8DBQA7LK7Xeeb23IiDVPcRAiHAAJ9D2tRhCVxpjEvlkNT23RGPx8XK7ACeNdmF
RxV1ZT2R83PXvtdPwAYfxKI=
=UxCn
-----END PGP SIGNATURE-----

Attachment: pgpseNuqu_DwM.pgp
Description: PGP signature


Reply to: