[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Solving the compression dilema when rsync-ing Debian versions



On Mon, Jan 08, 2001 at 08:27:53AM +1100, Sam Couter wrote:
> Otto Wyss <otto.wyss@bluewin.ch> wrote:
> > 
> > So why not solve the compression problem at the root? Why not try to
> > change the compression in a way so it does produce a compressed result
> > with the same (or similar) difference rate as the source? 
> 
> Are you going to hack at *every* different kind of file format that you
> might ever want to rsync, to make it rsync friendly?
> 
> Surely it makes more sense to make rsync able to more efficiently deal with
> different formats easily.

I think you reach the right conclusion, but for the wrong reason.

Either you fix rsync for each of n file formats, or you fix n file formats
for rsync :)

The advantage of doing it in rsync-land is that you can do a better job; you
apply the inverse of the compression format at both ends, calculate the
differences, and re-apply compression (probably gzip rather than the original
algorithm, but it depends) to these.  Trying to hack compression algorithms to
fit rsync is in general a bad idea.  Rusty could probably get away with it for
gzip, because it is very simple - decompression of gzip is interpreting codes
like "repeat the 17 characters you saw 38 characters ago".

Other, more sophisticated algorithms, like bzip2 (go and read about the
Burrows-Wheeler Transform, it's amazing ;) would be much harder to hack in any
reasonable way.

--

|> |= -+- |= |>
|  |-  |  |- |\

Peter Eckersley
(pde@cs.mu.oz.au)
http://www.cs.mu.oz.au/~pde
	
for techno-leftie inspiration, take a look at
http://www.computerbank.org.au/

Attachment: pgpvObLp1skLI.pgp
Description: PGP signature


Reply to: