Re: Apt & rsync
On Fri, 1 Oct, Jason Gunthorpe wrote:
> On Fri, 1 Oct 1999, Gary Allpike wrote:
>
> > Could apt be made to use rsync ??
>
> No, rsync is not suited for such a task
rsync seems quite well suited to the task, its just that, as you point
out:
> > For example, if I have a package still in my package cache why couldnt apt
> > rsync the older .deb up to the newer version ?
> Nope, the gzip compression scrambles the contents so that rsync doesn't
> have any effect.
This is quite true, but raises the obvious question: why not change gzip
so that it doesn't scramble the contents so badly? This would have a
slight cost in compression percentage, but bandwidth gains should more
than make up for it. Andrew Tridgell addresses the issue in his original
Ph.D. thesis about rsync; see
http://samba.org/ftp/tridge/thesis/phd_thesis.ps, pages 76-78 for a
discussion. Let me quote a little:
A different solution to using rsync with compressed files which
overcomes these problems is to use file compression algorithms
which do not propagate changes throughout the rest of the
compressed file when changes are made. ...
It is ... quite easy to modify almost any existing compression
algorithm to limit the distance that changes propagate without
greatly reducing the compression ratio. ...
The modification is quite simple:
1. A fast rolling signature is computed for a small window around
the current point in the uncompressed file;
2. stream compression progresses as usual;
3. when the rolling signature equals a pre-determined value the
compression tables are reset and a token is emitted indicating
the start of a new compression region.
This works because the compression will be "synchronized" as soon
as the rolling signature matches the pre-determined value. Any
changes in the uncompressed file will propagate an average of
2^b/c bytes through the compressed file, where b is the effective
bit strength of the fast rolling signature algorithm and c is the
compression ratio.
The value for b can be chosen as a tradeoff between the
propagation distance ... and the cost in terms of reduced
compression ... For compression algorithms designed to be used for
the distribution of files which are many megabytes in size a
propagation distance of a few kilobytes would be appropriate. In
that case the first few bits of the fast signature algorithm used
in rsync could be used to provide a weak fast signature algorithm.
As you can see, this seems quite easy to implement. The main obstacle is
that it introduces another compression scheme for users to keep track of.
But Debian seems to be in the perfect position to implement this (though
it would take a while, of course): we can guarantee that all users would
have the appropriate software.
It seems like this is a weekend hack or less for the necessary
modified (and renamed) gzip and apt-get; to see how well it works, someone
would need to set up a mirror (probably a subset mirror) with everything
recompressed like this.
Is there interest in this? Is it a good idea? My weekends are a bit busy
just now, but it sounds like a fun project.
Is this the appropriate list to discuss this on?
Best,
Dylan Thurston
dpt@math.berkeley.edu
(Please CC: me any responses.)
Reply to: