[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Question: Packages.xz and Contents-<arch>.xz



On Thu, 15 Nov 2012, Peter Samuelson wrote:
> [Hideki Yamane]
> > > henrich@hp:/tmp$ du -k Packages.*
> > > 6052	Packages.bz2
> > > 5812	Packages.xz
> > > henrich@hp:/tmp$ time bzip2 -d Packages.bz2 
> > > 
> > > real	0m0.999s
> > > user	0m0.956s
> > > sys	0m0.020s
> > > 
> > > henrich@hp:/tmp$ rm Packages
> > > henrich@hp:/tmp$ time xz -d Packages.xz 
> > > 
> > > real	0m0.565s
> > > user	0m0.532s
> > > sys	0m0.032s
> > 
> > > henrich@hp:/tmp$ time gzip -d Packages.gz 
> > > gzip: Packages already exists; do you wish to overwrite (y or n)? y
> > > 
> > > real	0m1.932s
> > > user	0m0.272s
> > > sys	0m0.012s
> 
> While your post has good points, we need to notice that because of the
> interactive prompt, the 'real' time value for gzip -d is misleading.
> 
> >  decompression speed is 
> >   best  : xz
> >   second: bz2
> >   third : gz
> 
> If you ignore the time gzip spent waiting for you to type 'y', it is
> the fastest, not the slowest.

Yes, gzip is the fastest decompressor (measured by input ratio or user
time).

However, when there is a major difference in compression ratio or the input
ratio is low because of external factors (e.g. low bandwidth), a stream with
much higher compresison rate and a slower decompressor can still be much
faster when you measure the output ratio.  I.e. it will use more cpu
time/energy, but it will finish the job sooner.

gzip is really fast and widely supported, and it also has much lower
worst-case memory requirements (xz isn't a memory pig when decompressing,
but how much memory it needs depends on the input stream and the worst case
is 6 times the best case, at ~63MiB).  However, xz compresses so much
better, it is not funny.  We should keep both.

bzip2 should be deprecated (i.e. we should transition to xz).  We keep bzip2
as standard because there is way too much .bz2 around, but we start
generating .xz instead of .bz2 when we can.

I've found that data which depends on large windows *and* sub-window
reordering to compress at its best does _really_ well in xz, and hideously
in gzip and bzip2.  Play with the "intel-microcode" Debian source package
and several compressors, and you will see what I mean.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


Reply to: