[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

/usr/share/doc/ files and gzip/xz/no compression



On Mon, Aug 15, 2011 at 05:16:55PM +0900, Charles Plessy wrote:
> Le Mon, Aug 15, 2011 at 01:48:50AM +0200, Adam Borowski a écrit :
> > 
> > * A year ago, I repacked CD1, .xz took 66% space needed by .gz.  This time,
> >   on the whole archive, gains are somewhat smaller: 72%.  I guess that CD1
> >   is code-heavy while packages of lower priorities tend to have more data.
> 
> Also, many files in /usr/share/doc are gzipped as per §12.3; that can prevent
> to get the full benefit of xz compression.  In some of my packages containing
> mostly such files, the benefit of switching to xz is almost null.  I wonder if
> it still makes sense to compress these files by default:
> 
>  - Most systems have enough space to keep them uncompressed,
>  - others systems just do not install these files,
>  - some filesystems are compressed on the fly,
>  - the binary packages themselves are compressed.

On the other hand, many computers now have an SSD drive, for speed,
which is relatively small. Further, most users will likely need files in
/usr/share/doc rarely, if ever, so not compressing things risks wasting
a bunch of disk space for no particular benefit.

To get some actual numbers, I wrote the attached script. On my laptop
running squeeze, it reports:

    Total size of *.gz files in /usr/share/doc: 170542915
    Total size of uncompressed *.gz files in /usr/share/doc: 611945610
    Total size of *.gz files in /usr/share/doc converted into *.xz: 140588208

That indicates that compressing documentation with xz instead of gz
does not save a whole lot (but does save some), but not compressing at
all wastes a lot. Putting the numbers into a table for easier comparison:

     raw     gz      xz
     584    163     134     file sizes (MiB)
       0    421     450     savings compared to raw (MiB)
    -421      0      29     savings compared to current gz (MiB)

So I would definitely vote for continuing to compress files in
/usr/share/doc. (Note that these numbers cover only files that are
currently *.gz, not all files in /usr/share/doc. See script for
details.)

I'm OK with allowing use of xz for compressing the files.

-- 
Freedom-based blog/wiki/web hosting: http://www.branchable.com/
#!/bin/sh

set -e

gzsum=$(find /usr/share/doc -type f -name '*.gz' -printf '%s\n' | 
        awk '{ s += $1 } END { print s }')
echo "Total size of *.gz files in /usr/share/doc: $gzsum"

rawsum=$(find /usr/share/doc -type f -name '*.gz' -exec zcat '{}' + | wc -c)
echo "Total size of uncompressed *.gz files in /usr/share/doc: $rawsum"

xzsum=$(find /usr/share/doc -type f -name '*.gz' -print0 |
        xargs -0n1 -I'{}' -- sh -c 'zcat {} | xz | wc -c' |
        awk '{ s += $1 } END { print s }')
echo "Total size of *.gz files in /usr/share/doc converted into *.xz: $xzsum"


Reply to: