[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Adding support for LZIP to dpkg, using that instead of xz, archive wide

On 2015-06-15 05:04:46 +0200, Guillem Jover wrote:
> On Sun, 2015-06-14 at 16:48:21 +0200, Vincent Lefevre wrote:
> > (this example is a postfix mail log) and uses much less memory for
> > compression:
> > 
> > $ sh -c 'ulimit -v 200000; lzip -9 < mail.log > /dev/null'
> > $ sh -c 'ulimit -v 800000; xz -9 < mail.log > /dev/null'
> > xz: (stdin): Cannot allocate memory
> > $ sh -c 'ulimit -v 800000; xz -9 < /dev/null > /dev/null'
> > xz: (stdin): Cannot allocate memory
> > 
> > Note: see the 200000 for lzip and 800000 for xz.
> The preset levels do not match between lzip and xz. For example for -9, xz
> uses a dictionary size of 64 MiB, while lzip uses 32 MiB. Other parameters
> are also probably quite different.

I don't think the dictionary size alone really matters. What matters
is the size of the result. On the above mail.log example, lz both
compresses better and uses less memory. From the tests I've done
with -9, it seems that:

* lzip uses much less memory than xz;
* lzip often compresses better than xz (but slightly better most of
  the time);
* lzip sometimes compresses much better than xz (mail.log example);
* xz sometimes compresses better than lzip.

So, I would say that there isn't an absolutely better compressor in
practice, but there are some good reasons that people may prefer lzip.

I haven't tested decompression time, but I have never heard of
complaints about it for lzip and xz.

> In addition lzip seems to be substantially slower (at least) when
> compressing compared to xz using the same preset levels.

Yes, however this doesn't always matter, in particular in a compress
once / transfer often / uncompress often model.

And AFAIK, the request is not to drop xz support, just to add lzip
support (though the "instead of" in the subject could be ambiguous).

Concerning the reproducibility, I suppose that it may change for lzip
too if it gets a larger dictionary size in the future. Compressors
could allow users to ensure reproducibility by specifying parameters;
xz has lots of parameters, and I wonder whether the OP has used them.

Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Reply to: