Re: binary patch
Jonathan Oxer <email@example.com> writes:
> On Thu, 2003-11-06 at 10:50, Martin Pitt wrote:
> > But isn't rsync supposed to do this? I don't know exactly how
> > efficiently it detects and compresses binary differences, but it
> > definitely does it and not too bad. With rsync, you get both the easy
> > management of complete debs and the bandwidth-saving of binary diffs.
> > The only problem is that apt does not support rsync IIRC, but this
> > could be solved by separately download the new debs into apt's cache
> > with a script using rsync.
> Actually IIRC there was some work to make Apt support rsync. Goswin
> Brederlow was talking about adding it in Jan 2001. And a lot of mirrors
> actually are set up to support rsync.
> The problem is that most mirrors are loaded up pretty hard already, and
> if everyone started using rsync they'd probably melt.
> So it's a tradeoff, bandwidth vs CPU. At the moment CPU seems to be the
> factor for mirror admins.
rsync has two problems for this:
1. gzip streams are pretty much uniq. A one character change in the
deb will create a completly different gzip stream (after that
character). The --rsyncable option for gzip tries to flush the gzip
dictionary at certain points so that rsync can catch on again.
2. rsync has a huge cpu and IO load on the servers. If every user
would use rsync the server would break down.
Several people, me including, have made rsync retrievers for apt with
various features but due to the two problems above it never got picked
up by the apt maintainer. In short rsync support is not wanted.
> Cheers :-)
> Jonathan Oxer
Way back (somewhere in 2001 iirc) I suggested implementing cnysr
(rsync backwards), which is a rsync with reversed roles. The checksum
files for the server can then be precalculated and stored along the
debs (2% mirror increase with 1K block size, less for bigger blocks)
or calculated (and cached) on demand.
Since no calculation needs to be done at the server side any http 1.1
server has all the features (Range statement) needed for cnysr. This
means that any http debian mirror could directly be used without any
changes apart from the client.
I also did some tests on using checksums of the uncompressed data
along with checksums for the compressed data and a more complex
algorithm to simulate "rsyncing" the uncompressed data while only
serving the compressed files on the server. That works even better
than the --rsyncable patch to gzip but takes a lot of round-robins to
the server and back (takes time) and an increased checksum file (2-4%