[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why does doc packages need to contain gzipped files?



Hi, 

On Sat, Jun 24, 2006 at 05:30:59PM +0200, Mario 'BitKoenig' Holbe wrote:
> Preben Randhol <randhol@pvv.org> wrote:
> > My point is that if I choose to install a doc packages I intend to use
> > it frequently and would therefore like that it is user friendly rather
> > than that one has squeezed some few kilobytes out by gzipping files. If
> 
> Agreed. Particularly since the saving isn't sooo big at all.
> On my - of course, not representative - workstation an uncompressed
> doc/ tree takes only about a third more space (and this includes all
> the ChangeLogs, READMEs etc. shipped with each package).
> 
> root@darkside:~# du -sh /usr/share/doc
> 839M    /usr/share/doc
> root@darkside:~# cp -ia /usr/share/doc /var/tmp
> root@darkside:~# cd /var/tmp/doc
> root@darkside:/var/tmp/doc# find . -type f -name \*.gz -print0 | xargs -0 gzip -d
> gzip: ./kernel-package/Rationale already exists;        not overwritten
> gzip: ./kernel-package/HOWTO-Linux-2.6-Woody already exists;    not overwritten
> gzip: ./gcc-4.1-base/.changelog.Debian.gz has 1 other link  -- unchanged
> gzip: ./gcc-4.1-base/changelog.Debian.gz has 1 other link  -- unchanged
> root@darkside:/var/tmp/doc# du -sh .
> 1,3G    .
> root@darkside:/var/tmp/doc#

Interesting stat. Let me follow-up.  On my system "du -sh ." returned:
 total            /usr            2.4G
 compressed       /usr/share/doc  239M
 uncompressed     /usr/share/doc  435M

Although it looks like 40% saving in space, its overall impact is less
than 10% shrink in size.

Let me add popular big doc (tetex, gcc, harden) packages to my system
and see what happens a bit more carefully.

 total            /usr            2.5G
 compressed       /usr/share/doc  305M
 uncompressed pdf /usr/share/doc  322M
       ( 25.7% total as told by gzip -l)
       ( 50136586  Compressed <--- 67439000 Before)
 uncompressed all /usr/share/doc  515M

Yes, space saving of compressing file size of PDF 25.7% only yield disk
space saving of 17MB. Out of more than 2.5GB installation, this is zero
gain.  So PDF compress just yield no real impact to space but slow read
time of doc packages.

Wait, there is less than 70MB of PDF.  Yes, this is true.  Due to
difficulties of making nice PDF out of XML/SGML without hitting FTBFS,
many packages does not bother PDF creation.  Most of the doc containing
PDF are:

 Debian doc project CVS related doc packages build with debiandoc-sgml
 TeTex related documents (mostly in tetex-doc)

Exceptions, I found, were under:
 shared-mime-info
 xen-doc
 dblatex
 hlatex
 fcitx
 tex-common
 texmf

The thing is we do not put HTML doc files in tar.gz'ed archive to save size.
That is the real space eater.  Putting unreasonable restriction on PDF
yield no real gain.  

As I see the fact above, at least PDF should not be required to be
compressed externally with gzip.

Since dh_compress does compression (except the copyright file, .html and
.css files, and files that appear to be already compressed based on
their extensions) per its manpage, why not treat PDF as "compressed"
which I thought is the case.  In this sense, we do not need policy
change.  Just minor change in code to realize what dh_compress claim to
do.

It is very slow to open over 1MB size PDF file even on a system with
proper auto-ungzipping. So aside from pedantic policy argument, we
should uncompress PDF.

Osamu

PS: I did not feel like using -X option now because debhelper default
should be desired behaviour.  But I may change my mind soon.

Reference:
root@dambo:/var/tmp2#  find . -type f -name \*.pdf.gz -print0 | xargs -0 gzip -l
         compressed        uncompressed  ratio uncompressed_name
             228023              510762  55.4% ./debian-policy/fhs/fhs-2.3.pdf
             486418              682351  28.7% ./debian-policy/policy.pdf
             318890              456439  30.1% ./debian/FAQ/debian-faq.en.pdf
             124536              155443  19.9% ./shared-mime-info/shared-mime-info-spec.pdf
             798976             1239893  35.6% ./Debian/reference/reference.en.pdf
             692308             1063782  34.9% ./Debian/reference/reference.de.pdf
             808949             1245798  35.1% ./Debian/reference/reference.es.pdf
             781341             1202792  35.0% ./Debian/reference/reference.fr.pdf
             828482             1274316  35.0% ./Debian/reference/reference.it.pdf
            1784610             2567959  30.5% ./Debian/reference/reference.ja.pdf
             901423             1340825  32.8% ./Debian/reference/reference.pl.pdf
             833802             1273284  34.5% ./Debian/reference/reference.pt-br.pdf
            2039169             2888341  29.4% ./Debian/reference/reference.zh-cn.pdf
            2125677             3018941  29.6% ./Debian/reference/reference.zh-tw.pdf
             224459              265209  15.4% ./texmf/tetex/TETEXDOC.pdf
             150469              174936  14.0% ./tex-common/Debian-TeX-Policy.pdf
             144848              204476  29.2% ./tetex-doc/etex/base/etex-man.pdf
             137706              166571  17.3% ./tetex-doc/metapost/base/mpgraph.pdf
            1727793             2377781  27.3% ./tetex-doc/help/faq/uktug-faq/uktug-faq.pdf
             467752              515805   9.3% ./tetex-doc/fonts/pxfonts/pxfontsdocA4.pdf
              74228               91788  19.2% ./tetex-doc/fonts/marvosym/marvodoc.pdf
             650391              768727  15.4% ./tetex-doc/fonts/txfonts/txfontsdocA4.pdf
            1003550             1081791   7.2% ./tetex-doc/fonts/antt/AntykwaTorunska-doc-en-2_01.pdf
             998224             1071514   6.8% ./tetex-doc/fonts/antt/AntykwaTorunska-doc-pl-2_01.pdf
             110332              131462  16.1% ./tetex-doc/fonts/gothic/suet.pdf
              96468              167372  42.4% ./tetex-doc/latex/mwcls/mwclsdoc.pdf
             368651              531205  30.6% ./tetex-doc/latex/SIunits/SIunits.pdf
              19531               23819  18.1% ./tetex-doc/latex/koma-script/koma-script.pdf
             998007             1486421  32.9% ./tetex-doc/latex/koma-script/scrguide.pdf
             865641             1309457  33.9% ./tetex-doc/latex/koma-script/scrguien.pdf
             347058              407399  14.8% ./tetex-doc/latex/amscls/amsrefs.pdf
             108059              123073  12.2% ./tetex-doc/latex/amscls/mathscinet.pdf
              88227              102227  13.7% ./tetex-doc/latex/amscls/textcmds.pdf
             180372              203601  11.4% ./tetex-doc/latex/amscls/amsrdoc.pdf
              48096               60826  21.0% ./tetex-doc/latex/amscls/amsthdoc.pdf
             145007              175195  17.2% ./tetex-doc/latex/amscls/instr-l.pdf
              38366               50063  23.4% ./tetex-doc/latex/amscls/thmtest.pdf
              98843              111235  11.2% ./tetex-doc/latex/amscls/changes.pdf
             126424              163156  22.5% ./tetex-doc/latex/geometry/geometry.pdf
            1909475             2390133  20.1% ./tetex-doc/latex/general/lshort.pdf
             116208              125962   7.8% ./tetex-doc/latex/lettrine/demo.pdf
             119578              151708  21.2% ./tetex-doc/latex/psnfss/psnfss2e.pdf
              11241               17192  34.8% ./tetex-doc/latex/pdfpages/dummy.pdf
               9368               12470  25.1% ./tetex-doc/latex/pdfpages/dummy-l.pdf
             132073              288035  54.2% ./tetex-doc/latex/oberdiek/twoopt.pdf
             163117              365369  55.4% ./tetex-doc/latex/oberdiek/alphalph.pdf
             150700              344174  56.2% ./tetex-doc/latex/oberdiek/pagesel.pdf
             148957              207393  28.2% ./tetex-doc/latex/oberdiek/hypcap.pdf
             148888              331597  55.1% ./tetex-doc/latex/oberdiek/hypbmsec.pdf
              25800               36449  29.3% ./tetex-doc/latex/amsmath/subeqn.pdf
              47269               58756  19.6% ./tetex-doc/latex/amsmath/technote.pdf
             231961              276719  16.2% ./tetex-doc/latex/amsmath/testmath.pdf
             221220              520025  57.5% ./tetex-doc/latex/amsmath/amsldoc.pdf
             235631              346068  31.9% ./tetex-doc/latex/hyperref/slides.pdf
             107295              161506  33.6% ./tetex-doc/latex/hyperref/paper.pdf
             258953              342982  24.5% ./tetex-doc/latex/hyperref/manual.pdf
             205919              234432  12.2% ./tetex-doc/latex/pict2e/pict2e.pdf
            1077900             1852923  41.8% ./tetex-doc/latex/memoir/memman.pdf
             392068              422277   7.2% ./tetex-doc/latex/dk-bib/dk-bib.pdf
             290012              337327  14.0% ./tetex-doc/latex/ntheorem/ntheorem.pdf
              88618              118233  25.1% ./tetex-doc/latex/eulervm/eulervm.pdf
             126391              172832  26.9% ./tetex-doc/tetex/eurotex98-te.pdf
             164354              242186  32.2% ./tetex-doc/fontinst/talks/et99-font-tutorial.pdf
             169423              201767  16.0% ./tetex-doc/fontinst/talks/et99-font-tables.pdf
             150382              172067  12.6% ./tetex-doc/fontinst/manual/intro98.pdf
             342214              381721  10.4% ./tetex-doc/fontinst/manual/fontinst.pdf
             332317              478831  30.6% ./tetex-doc/eplain/eplain.pdf
             702146              972471  27.8% ./tetex-doc/generic/pstricks/pstricks-add-doc.pdf
             275785              338150  18.5% ./tetex-doc/generic/pstricks/pst-lens.pdf
             115452              133800  13.7% ./tetex-doc/generic/pstricks/pst-gr3d.pdf
              41890               57352  27.0% ./tetex-doc/generic/pstricks/pst-blur.pdf
              91042              110916  17.9% ./tetex-doc/generic/pstricks/pst-slpe.pdf
             517853              561200   7.7% ./tetex-doc/generic/pstricks/pst-osci.pdf
            5096682             5187070   1.7% ./tetex-doc/generic/pstricks/vue3d-e.pdf
            1020946             1164106  12.3% ./tetex-doc/generic/pstricks/pst-3dplot-doc.pdf
             256937              307982  16.6% ./tetex-doc/generic/pstricks/pst-uml-doc.pdf
             143851              162112  11.3% ./tetex-doc/generic/pstricks/pst-math.pdf
              44177               55473  20.4% ./tetex-doc/generic/pstricks/tst-poly.pdf
             115290              139994  17.7% ./tetex-doc/generic/pstricks/pst-poly.pdf
            1192204             1381045  13.7% ./tetex-doc/generic/pstricks/pst-fill-doc.pdf
              78925               92356  14.6% ./tetex-doc/generic/pstricks/psgomanual.pdf
              80617              119595  32.6% ./tetex-doc/generic/pstricks/pstnews97-15.pdf
             140966              175120  19.5% ./tetex-doc/generic/pstricks/pst-circ-doc.pdf
             176058              205686  14.4% ./tetex-doc/generic/mfpic/mfpguide.pdf
             278090              407825  31.8% ./tetex-doc/generic/mfpic/mfpman.pdf
             695586              763788   8.9% ./tetex-doc/generic/xypic/xyrefer.pdf
             199812              229223  12.8% ./tetex-doc/generic/xypic/xyguide.pdf
             103151              130183  20.8% ./tetex-doc/generic/spanish/division.pdf
              60593              105735  42.7% ./tetex-doc/generic/ukrhyph/rules90.pdf
              59777               70816  15.6% ./tetex-doc/generic/ukrhyph/rules_ph.pdf
              86522              106565  18.8% ./tetex-doc/generic/ukrhyph/rules60.pdf
             151022              206964  27.0% ./tetex-doc/generic/tex-ps/cmyk-hax/cmyk-doc.pdf
              32172               32811   2.0% ./tetex-doc/pdftex/examples/pic.pdf
             396897              618807  35.9% ./tetex-doc/programs/web2c.pdf
             381800              414758   8.0% ./tetex-doc/programs/dvips.pdf
             364550              567905  35.8% ./tetex-doc/programs/kpathsea.pdf
             116535              152462  23.6% ./tetex-doc/programs/dvipdfm.pdf
             131266              167813  21.8% ./tetex-doc/programs/dvipng.pdf
            1088510             1664881  34.6% ./tetex-doc/programs/texinfo.pdf
             812409              985438  17.6% ./hlatex/guide/hlguide.pdf
             236849              290311  18.4% ./debiandoc-sgml-doc/debiandoc-sgml.en.pdf
              91005              103816  12.4% ./xen/pdf/interface.pdf
             145649              164917  11.7% ./xen/pdf/user.pdf
             323613              420926  23.1% ./dblatex/manual.pdf
             168530              181283   7.1% ./fcitx/fcitx3.pdf
            2346394             3590496  34.7% ./ddd-doc/ddd.pdf
             145928              250564  41.8% ./ddd-doc/ddd-themes.pdf
             790846             1166650  32.2% ./harden-doc/securing-debian-howto.de.pdf
             730825             1093197  33.2% ./harden-doc/securing-debian-howto.en.pdf
             758996             1109269  31.6% ./harden-doc/securing-debian-howto.fr.pdf
           50136586            67439000  25.7% (totals)

Regards,

Osamu



Reply to: