Re: Packages file missing from unstable archive
Anthony Towns <email@example.com> writes:
> Hrm, thinking about it, I guess zsync probably works by storing the
> state of the gzip table at certain points in the file and doing a
> rolling hash of the contents and recompressing each chunk of the file;
> that'd result in the size of the .gz not necessarily being the same, let
> alone the md5sum.
zsync has to recompress the raw data locally and for that it has to
guess at the implementation used to compress the initial file. But for
debs that should be deterministic. zsync can garanty that
recompressing gives the same result by checking that is does when
creating the checksum files. If the input file and zsync's
recompression result in the same then it will always be the same
unless zsync changes its gzip implementation.
> Feh, trying to verify this with ~512kB of random data, gzipped, I just
> keep getting "Aborting, download available in zsyncnew.gz.part". That's
> not terribly reassuring. And trying it with gzipped text data, I get
> stuck on 99.0%, with zsync repeatedly requesting around 700 bytes.
> Anyway, if it's recompressing like I think, there's no way to get the
> same compressed md5sum -- even if the information could be transferred,
> there's no guarantee the local gzip _can_ produce the same output as
> the remote gzip -- imagine if it had used gzip -9 and your local gzip
> only supports -1 through -5, eg.
zsync doesn't fork of some unknown local gzip and it knows what its
own gzip routines can produce. It can easily be guaranteed that the
zsync client behaves the same way as the remote zsync checksum program
that would test for recompressability.
The failure to sync the file is definetly a bug in zsync. Even if the
recompression fails (which it should know beforehand) it should fall
back to syncing the compressed data and produce the expected result.
> Hrm, it probably also means that mirrors can't use zsync -- that is,
> if you zsync fooA to fooB you probably can't use fooA.zsync to zsync
> from fooB to fooC.
> Anyway, just because you get a different file, that doesn't mean it'll
> act differently; so we could just use an "authentication" mechanism
> that reflects that. That might involve providing sizes and sha1s of the
> uncompressed contents of the ar in the packages file, instead of the
> md5sum of the ar. Except the previous note probably means that you'd
> still need to use the md5sum of the .deb to verify mirrors; which means
> mirrors and users would have different ways of verifying their
> downloads, which is probably fairly undesirable.
Too bad Packages files contain the md5sum of the full deb. Changing
that would be a ugly and lengthy process. So lets not do that.
The only sane way is to make zsync produce identical debs. It isn't
trivial but not impossible.
> Relatedly, mirrors (and apt-proxy users, etc) need to provide Packages.gz
> of a particular md5sum/size, so they can't use Packages.diff to speed
> up their diffs. It might be worth considering changing the Release file
> definition to just authenticate the uncompressed files and expect tools
> like apt and debootstrap to authenticate only after uncompressing. A
> "Compression-Methods: gz, bz2" header might suffice to help tools work
> out whether to try downloading Packages.gz, Packages.bz2 or just plain
> Packages first. Possibly "Packages-Compress:" and "Sources-Compress:"
> might be better.
% gunzip <Packages.gz | gzip -9 >Packages.gz.2
% gunzip <Packages.gz | gzip -9 >Packages.gz.3
% gunzip <Packages.gz | gzip -9 -n >Packages.gz.4
% gunzip <Packages.gz | gzip -9 -n >Packages.gz.5
% md5sum *
The problem is the timestamp in gzip files. If you patch the DAK to
use the -n switch then Packages.diff can be used to update Packages
and then recompress it.
Further zsync could include the timestamp in the .zsync file and
recompress to the same timestamp.