[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

replicating package compression used by dpkg-deb



I'm working on a system for differential transport of deb packages using
zsync (client-side rsync). This involves decompressing the gzip'ed tar
archives in the package and then recompressing them later. In order that
the reconstructed package match the "Size" and "MD5sum" fields in the
Apt packages list, the recompression step must produce exactly the same
result as dpkg-deb did originally. This turns out to be a problem:

  $ ar -x /var/cache/apt/archives/fuse-utils_2.5.3-4.1_i386.deb
  $ 
  $ ls
  control.tar.gz  data.tar.gz  debian-binary
  $ 
  $ for x in -{1,2,3,4,5,6,7,8,9}\ {-n,}\ {--rsyncable,}
  > do
  >   gzip -dc data.tar.gz | gzip $x | cmp -s data.tar.gz - && echo $x
  > done
  $ 

So it seems that with the currently shipping versions of gzip (I tried
both "stable/updates" and "testing"), there is actually no way to
exactly replicate the compression produced by the standard package build
system.

It appears that dpkg-deb does not exec gzip, and it's not dynamically
linked with anything except glibc. I suppose that it's statically linked
against zlib1g or something like it. So the question is, how can the
exact compression algorithm used by dpkg-deb be made available for
another piece of software? Is it something that's well-specified, or is
it liable to change at any moment?

I see that the Packages lists have started to include SHA-1 and SHA-256
hashes, but I gather that these are not yet in use by any standard part
of the Debian system. Is there any chance that these could be
respecified to be the hash of the uncompressed package contents, like
this?

  $ cat debian-binary control.tar data.tar | sha256sum

In addition to facilitating alternative package distribution systems
such as the one I'm working on, this would have the advantage of
allowing the substitution of other compression algorithms like bzip2,
without requiring any change in the Debian package distribution
infrastructure.

Of course the whole problem would be radically simplified if the
"--rsyncable" option were used when the packages were being built. Does
anyone know what the rationale for not doing that is?


-- Ian Bruce



Reply to: