[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: /usr/share/doc/ files and gzip/xz/no compression



* Andreas Barth [2011-08-15 23:59 +0200]:
> * Lars Wirzenius (liw@liw.fi) [110815 23:27]:
> > On Mon, Aug 15, 2011 at 11:04:51PM +0200, Carsten Hey wrote:
> > > * Lars Wirzenius [2011-08-15 18:33 +0100]:
> > > >      raw     gz      xz
> > > >      584    163     134     file sizes (MiB)
> > > >        0    421     450     savings compared to raw (MiB)
> > > >     -421      0      29     savings compared to current gz (MiB)
>
> > In other words, it's 130 MiB against xz's 134 MiB. I'll leave it to
> > others to decide if it's a significatn difference.
>
> bzip2 is definitly a more conservative choice than xz. If it's
> smaller, than it's superior to xz.

bzip2 has a better compression on average for some filetypes, xz[1] has
a better compression on average for others:

                   gzip      bzip2       xz     bzip2+xz[3]
  text files[2]   94312922  73496587  77783076  73496587
  other files     16577181  14609893  14275484  14275484
  sum            110890103  88106480  92058560  87772071

Among the "other files" are also a lot of text files, if we would
compress Debian packages instead, xz would win presumably.

Anyway, I don't think this difference of 4 MiB on a desktop system is
significant.


I would prefer to avoid bloating the set of pseudo essential packages
without a good reason and I think users should be able to decompress all
files in /u/s/d.  There are plans to let dpkg depend on liblzma2 instead
of xz and it already depends on libbz2-1.0.  If dpkg's dependency on
libbz2 is planned to be removed in future, I would prefer to let libbz2
vanish from the pseudo essential set and use xz also for /u/s/d,
otherwise I would prefer using bzip2 over xz for /u/s/d.


Carsten


 [1] I did not use -e nor -9, but the difference should not be that big
     on files in /usr/share/doc.
 [2] find ... -regex '.*\(changelog\|copyright\|README\|TODO\|NEWS\).*[.]gz'
 [3] bzip2 for text files and xz for other files.  This is of course
     nothing we should consider doing.


Reply to: