[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Packages file missing from unstable archive

> On Tue, Nov 01, 2005 at 09:54:09AM -0500, Michael Vogt wrote:
> > A problem is that zsync needs to teached to deal with deb files (that
> > is, that it needs to unpack the data.tar and use that for the syncs).

[Anthony Towns]
> That seems kinda awkward -- you'd need to start by downloading the ar
> header, working out where in the file the data.tar.gz starts, then
> redownloading from there. I guess you could include that info in the
> .zsync file though.

Right, the latter.  Having downloaded the .zsync file, you calculate
your local checksums against the ones in that file and you know exactly
what's left to be downloaded and what to do with it.  The .zsync file
includes a sort of map of the structure of the target, not unlike a
jigdo file.

> OTOH, there should be savings in the control.tar.gz too, surely --
> it'd change less than data.tar.gz most of the time, no?

He was only comparing data.tar.gz because that made for a simpler
mock-up.  zsync doesn't currently dig into a .deb at all, so this was
just a simulation, as it were.

> Hrm, thinking about it, I guess zsync probably works by storing the
> state of the gzip table at certain points in the file and doing a
> rolling hash of the contents and recompressing each chunk of the
> file

I haven't actually looked at the implementation of zsync, but I've
always assumed that zsync assumes a homogeneous (i.e., predictable)
gzip algorithm everywhere, works out the known variables by trial and
error, and stores the appropriate amount of state to reproduce the gzip
file exactly, given the assumptions about the gzip implementation.

For that to be correct assumes a certain homogeneity of the zlib used
by zsync implementations; for it to be efficient assumes the same about
whatever is used to compress files in gzip format.  I've always
harbored my doubts about the deployment scalability of this approach.

> Anyway, just because you get a different file, that doesn't mean
> it'll act differently; so we could just use an "authentication"
> mechanism that reflects that. That might involve providing sizes and
> sha1s of the uncompressed contents of the ar in the packages file,
> instead of the md5sum of the ar.

Authenticating uncompressed content is a good design choice anyway.
Makes it easier, for instance, to add gpg signatures inside the ar
file, without invalidating existing checksum authentication.

Conceptually, authenticating content based on a container which is
essentially nondeterministic is a bit like refusing to authenticate a
person because he or she is wearing different clothes from the ones
noted in the auth database.

Attachment: signature.asc
Description: Digital signature

Reply to: