[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Solving the compression dilema when rsync-ing Debian versions

> No, I want rsync not even to be mentioned. All I want is something
> similar to
>         gzip --compress-like=old-foo foo
> where foo will be compressed as old-foo was or as aquivalent as
> possible. Gzip does not need to know anything about foo except how it
> was compressed. The switch "--compress-like" could be added to any
> compression algorithmus (bzip?) as long as it's easy to retrieve the
> compression scheme. Besides the following is completly legal but
> probably not very sensible

No, this won't work with very many compression algorithms.  Most
algorithms update their dictionaries/probability tables dynamically based
on input.  There isn't just one static table that could be used for
another file, since the table is automatically updated after every (or
near every) transmitted or decoded symbol.  Further, the algorithms start
with blank tables on both ends (compression and decompression), the
algorithm doesn't transmit the tables (which can be quite large for higher
order statistical models).

I suggest you read about LZW and arithmetic encoding with higher order
stitistical models.  Try "The Data Compression Book" by Nelson (?) for a
fairly good overview of how these work.

What is better and easier is to ensure that the compression is
deturministic (gzip by default is not, bzip2 seems to be), so that rsync
can decompress, rsync, compress, and get the exact file back on the other

Andrew Lenharth

Reply to: