Re: Solving the compression dilema when rsync-ing Debian versions

To: Otto Wyss <otto.wyss@bluewin.ch>
Cc: debian-devel@lists.debian.org
Subject: Re: Solving the compression dilema when rsync-ing Debian versions
From: Andrew Lenharth <adl@cs.washington.edu>
Date: Wed, 10 Jan 2001 10:29:22 -0800 (PST)
Message-id: <[🔎] Pine.LNX.4.21.0101101021300.31159-100000@akagai.cs.washington.edu>
In-reply-to: <[🔎] 1emzfuz.1mfh4h86woi2M%otto.wyss@bluewin.ch>

> > No, this won't work with very many compression algorithms.  Most
> > algorithms update their dictionaries/probability tables dynamically based
> > on input.  There isn't just one static table that could be used for
> > another file, since the table is automatically updated after every (or
> > near every) transmitted or decoded symbol.  Further, the algorithms start
> > with blank tables on both ends (compression and decompression), the
> > algorithm doesn't transmit the tables (which can be quite large for higher
> > order statistical models).
> > 
> Well the table is perfectly static when the compression ends. Even if
> the table isn't transmitted itself, its information is contained in the
> compressed file, otherwise the file couldn't be decompressed either. 

But the tables you have at the end of the compression are NOT what you
want to use for the entire process.  The point of a dynamic table is to
allow the probabilities of different symbols to change dynamically as the
compression happens.  The tables used by the end of the file may be very
different than those used early in the file, to the point where they are
useless for the early part of the file.

Without fear of sounding redundent, EACH symbol is encoded with a
different set of tables.  That is the probability tables or dictionaries
are different for EACH and EVERY character of the file.  And as I said
before, the tables from latter in the compression (which you propose
using all the time) will not even generate the same compressed file as the
one they are based on nor will they be anywhere near optimal for the file.
That is, "gzip --compress-like=foo.gz foo" would generate a entirely
different foo.gz than "gzip foo" would.

I really suggets you investigate LZW based algorithms.  You would find
they do not behave as you think.  Only incredibly simple static
compression algorithms have the properties you desire.

Andrew Lenharth

Reply to:

References:
- Re: Solving the compression dilema when rsync-ing Debian versions
  - From: otto.wyss@bluewin.ch (Otto Wyss)

Prev by Date: Re: lynx 2.8.4dev.16 --with-ssl
Next by Date: Re: debian-devel-digest Digest V101 #59
Previous by thread: Re: Solving the compression dilema when rsync-ing Debian versions
Next by thread: Re: Solving the compression dilema when rsync-ing Debian versions
Index(es):
- Date
- Thread