Re: Apt and rsync... I know...
Doug Holland <email@example.com> writes:
> /me brandishes a 2x4 and approaches the decomposed equine carcass
> I just finished doing my apt upgrade, which included updated
> openoffice.org packages. Like many others, I have to do this over a
> dialup connection. The openoffice.org-bin package is 43 megabytes,
> which is a 2 1/2 hour download over a 56k dialup connection,
> assuming you have halfway decent phone lines.
> Let's take a look in changelog.Debian.gz for openoffice.org-bin:
> > openoffice.org (1.1.0-5) unstable; urgency=low
> > * Modified patches:
> > - recent-files.diff: #include <assert.h>, fixes build failure on
> > powerpc [RE]
> > -- Rene Engelhard <firstname.lastname@example.org> Fri, 16 Jan 2004 21:25:20 +0100
> That's it. One piddly little patch, which most likely affected one
> line of source code, required me to waste 2.5 hours of dialup
> bandwidth downloading a .deb file that's almost identical to the one
> I downloaded yesterday. Am I the only one who finds this
> Let's face it. We need to find a way to make apt do rsync to save
> us some pain.
> I remember the reason why apt is not currently doing rsync is
There is a problem with rsyncing gzip files. A change in the raw data
makes the compressed stream differ for the rest. The gzip --rsyncable
option tries to solve that and aparently well enough. All debs should
be compressed using that option.
> because it hogs I/O and CPU cycles on the Debian servers. That's a
> valid reason, but surely there are ways around it.
Its called cnysr and I asked for supporting it i think over a year
ago, never got a reply.
cnysr (rsync backwards you might have noticed) reverses the roles the
client and server play in rsync. Instead of the client sending a
blockwise checksum to the server the server send them, either
calculated on the fly or from a precalculated file. The client then
checks what blocks it needs and requests those. The blockwise
checksums are (depending on the block size, assuming 1K) about 2% of
- blockwise checksums are small and can be cached, they never change
for a file
- cnysr uses the http/1.1 protocol to fetch files and blocks
thereof. If master calculates the checksum files any http mirror can
be used instantly to cnysr files.
- I/O and CPU load is shifted to the client, the server does no
computations at all with precalculated checksums.
- cnysr needs 2% more mirror space (or less for bigger blocksizes) for
the precalculated checksums. Not calculating them would put us back
where rsync is.
- cnysr needs to tell the server what to send. There is an additional
round-trip per block but http/1.1 allows to requests blocks in
- http connects don't stay open too long, slow client or big files
with minor changes will have to reconnect if there is some idle time
on the connect (usualy 60s idle kills the connect).
> I suggest that rsync files be precalculated, so rsync downloads
> don't have to be crunched on the fly. The servers would store .deb
> files - foo-x.y.z.deb, and they would store the rsync diffs between
> it and the previous version - foo.x.y.z-1_x.y.z.rsdeb. That way, if
> the user doing an apt-get upgrade has the previous .deb file in his
> cache, apt would download the rsync diff file instead of the full
> .deb, saving loads of bandwidth, and since the rsyncs are
> precomputed and cached, the servers don't get hosed.
> Am I totally off base suggesting this?
The diff files only work for exactly the two versions they are made
for. Cnysr works with any two files with the same efficiancy rsync
uses. Drawback is that http mirrors are needed while the rsync files
could be fetched via ftp too.
But either way, the --rsyncable option is needed first.
PS: Planed extensions to cnysr include handling files (such as debs)
in uncompressed form on the fly in the checksum file and client side
(no change to the deb files on the server).