[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#128818: [patch] packages.gz diff support for apt



On Thu, Nov 18, 2004 at 08:39:37PM +0100, Michael Vogt wrote:
> Patches for the package file are generated like this:
> "diff Packages-23-08-2004 Packages-24-08-2004 | gzip -c >      \
>  Packages_diff_`md5sum Packages-23-08-2004|awk '{print $1}'`.gz"
> 
> The code will download until it finds a empty patch, it assumes then
> that the index is now up-to-date and stops. If it does not find a
> patch it will auto-fallback to Packages.bz2 and then to
> Packages.gz. The code is diffed against the arch repository at:
> http://people.debian.org/~mdz/arch/apt@packages.debian.org
> (apt@packages.debian.org/apt--main--0) 

This sounds like a good and easy solution to me. However, it does
require N+1 iterations of downloading a patch, applying it, md5summing,
and again polling the webserver.

This is avoidable:
Every time a new packages file comes available, calculate diff -e (ed
script) of the old and today's packages file. As a bonus, this is also
smaller since deletions are simple ranges, and not included in the
'patch'.

Then, for each existing Packages.<md5sum>.diff.gz, append the thusly
calculated edscript. It will result in a new edscript that transforms
the packages file with <md5sum> as md5sum into the most current one.

Applying the ed script in-place goes like this:
$ ( zcat $patch ; echo w ) | ed Packages

md5sum can (should?) be checked afterwards of course, multiple
possibilities here. You could append the md5sum as ed comment in the ed
script, for example, which will not make apt do any additional download
-- this way, a Packages file update requires exactly one download, and
two md5sum calculations client-side (one before to determine filename,
one after to verify). Analogous to the original suggestion, you can add
an empty ed script for the current md5sum to cater for people apt-get
updating while being uptodate -- though this is not required anymore to
signal the last ed script, as all ed scripts will transform into the
newest packages file.

Downside is of course a little bit more wasted diskspace server-side,
but on the upside, a much faster round-trip time for clients.

--Jeroen

-- 
Jeroen van Wolffelaar
jeroen@wolffelaar.nl
http://jeroen.A-Eskwadraat.nl



Reply to: