Re: Why does doc packages need to contain gzipped files?
Hi,
On Sat, Jun 24, 2006 at 05:30:59PM +0200, Mario 'BitKoenig' Holbe wrote:
> Preben Randhol <randhol@pvv.org> wrote:
> > My point is that if I choose to install a doc packages I intend to use
> > it frequently and would therefore like that it is user friendly rather
> > than that one has squeezed some few kilobytes out by gzipping files. If
>
> Agreed. Particularly since the saving isn't sooo big at all.
> On my - of course, not representative - workstation an uncompressed
> doc/ tree takes only about a third more space (and this includes all
> the ChangeLogs, READMEs etc. shipped with each package).
>
> root@darkside:~# du -sh /usr/share/doc
> 839M /usr/share/doc
> root@darkside:~# cp -ia /usr/share/doc /var/tmp
> root@darkside:~# cd /var/tmp/doc
> root@darkside:/var/tmp/doc# find . -type f -name \*.gz -print0 | xargs -0 gzip -d
> gzip: ./kernel-package/Rationale already exists; not overwritten
> gzip: ./kernel-package/HOWTO-Linux-2.6-Woody already exists; not overwritten
> gzip: ./gcc-4.1-base/.changelog.Debian.gz has 1 other link -- unchanged
> gzip: ./gcc-4.1-base/changelog.Debian.gz has 1 other link -- unchanged
> root@darkside:/var/tmp/doc# du -sh .
> 1,3G .
> root@darkside:/var/tmp/doc#
Interesting stat. Let me follow-up. On my system "du -sh ." returned:
total /usr 2.4G
compressed /usr/share/doc 239M
uncompressed /usr/share/doc 435M
Although it looks like 40% saving in space, its overall impact is less
than 10% shrink in size.
Let me add popular big doc (tetex, gcc, harden) packages to my system
and see what happens a bit more carefully.
total /usr 2.5G
compressed /usr/share/doc 305M
uncompressed pdf /usr/share/doc 322M
( 25.7% total as told by gzip -l)
( 50136586 Compressed <--- 67439000 Before)
uncompressed all /usr/share/doc 515M
Yes, space saving of compressing file size of PDF 25.7% only yield disk
space saving of 17MB. Out of more than 2.5GB installation, this is zero
gain. So PDF compress just yield no real impact to space but slow read
time of doc packages.
Wait, there is less than 70MB of PDF. Yes, this is true. Due to
difficulties of making nice PDF out of XML/SGML without hitting FTBFS,
many packages does not bother PDF creation. Most of the doc containing
PDF are:
Debian doc project CVS related doc packages build with debiandoc-sgml
TeTex related documents (mostly in tetex-doc)
Exceptions, I found, were under:
shared-mime-info
xen-doc
dblatex
hlatex
fcitx
tex-common
texmf
The thing is we do not put HTML doc files in tar.gz'ed archive to save size.
That is the real space eater. Putting unreasonable restriction on PDF
yield no real gain.
As I see the fact above, at least PDF should not be required to be
compressed externally with gzip.
Since dh_compress does compression (except the copyright file, .html and
.css files, and files that appear to be already compressed based on
their extensions) per its manpage, why not treat PDF as "compressed"
which I thought is the case. In this sense, we do not need policy
change. Just minor change in code to realize what dh_compress claim to
do.
It is very slow to open over 1MB size PDF file even on a system with
proper auto-ungzipping. So aside from pedantic policy argument, we
should uncompress PDF.
Osamu
PS: I did not feel like using -X option now because debhelper default
should be desired behaviour. But I may change my mind soon.
Reference:
root@dambo:/var/tmp2# find . -type f -name \*.pdf.gz -print0 | xargs -0 gzip -l
compressed uncompressed ratio uncompressed_name
228023 510762 55.4% ./debian-policy/fhs/fhs-2.3.pdf
486418 682351 28.7% ./debian-policy/policy.pdf
318890 456439 30.1% ./debian/FAQ/debian-faq.en.pdf
124536 155443 19.9% ./shared-mime-info/shared-mime-info-spec.pdf
798976 1239893 35.6% ./Debian/reference/reference.en.pdf
692308 1063782 34.9% ./Debian/reference/reference.de.pdf
808949 1245798 35.1% ./Debian/reference/reference.es.pdf
781341 1202792 35.0% ./Debian/reference/reference.fr.pdf
828482 1274316 35.0% ./Debian/reference/reference.it.pdf
1784610 2567959 30.5% ./Debian/reference/reference.ja.pdf
901423 1340825 32.8% ./Debian/reference/reference.pl.pdf
833802 1273284 34.5% ./Debian/reference/reference.pt-br.pdf
2039169 2888341 29.4% ./Debian/reference/reference.zh-cn.pdf
2125677 3018941 29.6% ./Debian/reference/reference.zh-tw.pdf
224459 265209 15.4% ./texmf/tetex/TETEXDOC.pdf
150469 174936 14.0% ./tex-common/Debian-TeX-Policy.pdf
144848 204476 29.2% ./tetex-doc/etex/base/etex-man.pdf
137706 166571 17.3% ./tetex-doc/metapost/base/mpgraph.pdf
1727793 2377781 27.3% ./tetex-doc/help/faq/uktug-faq/uktug-faq.pdf
467752 515805 9.3% ./tetex-doc/fonts/pxfonts/pxfontsdocA4.pdf
74228 91788 19.2% ./tetex-doc/fonts/marvosym/marvodoc.pdf
650391 768727 15.4% ./tetex-doc/fonts/txfonts/txfontsdocA4.pdf
1003550 1081791 7.2% ./tetex-doc/fonts/antt/AntykwaTorunska-doc-en-2_01.pdf
998224 1071514 6.8% ./tetex-doc/fonts/antt/AntykwaTorunska-doc-pl-2_01.pdf
110332 131462 16.1% ./tetex-doc/fonts/gothic/suet.pdf
96468 167372 42.4% ./tetex-doc/latex/mwcls/mwclsdoc.pdf
368651 531205 30.6% ./tetex-doc/latex/SIunits/SIunits.pdf
19531 23819 18.1% ./tetex-doc/latex/koma-script/koma-script.pdf
998007 1486421 32.9% ./tetex-doc/latex/koma-script/scrguide.pdf
865641 1309457 33.9% ./tetex-doc/latex/koma-script/scrguien.pdf
347058 407399 14.8% ./tetex-doc/latex/amscls/amsrefs.pdf
108059 123073 12.2% ./tetex-doc/latex/amscls/mathscinet.pdf
88227 102227 13.7% ./tetex-doc/latex/amscls/textcmds.pdf
180372 203601 11.4% ./tetex-doc/latex/amscls/amsrdoc.pdf
48096 60826 21.0% ./tetex-doc/latex/amscls/amsthdoc.pdf
145007 175195 17.2% ./tetex-doc/latex/amscls/instr-l.pdf
38366 50063 23.4% ./tetex-doc/latex/amscls/thmtest.pdf
98843 111235 11.2% ./tetex-doc/latex/amscls/changes.pdf
126424 163156 22.5% ./tetex-doc/latex/geometry/geometry.pdf
1909475 2390133 20.1% ./tetex-doc/latex/general/lshort.pdf
116208 125962 7.8% ./tetex-doc/latex/lettrine/demo.pdf
119578 151708 21.2% ./tetex-doc/latex/psnfss/psnfss2e.pdf
11241 17192 34.8% ./tetex-doc/latex/pdfpages/dummy.pdf
9368 12470 25.1% ./tetex-doc/latex/pdfpages/dummy-l.pdf
132073 288035 54.2% ./tetex-doc/latex/oberdiek/twoopt.pdf
163117 365369 55.4% ./tetex-doc/latex/oberdiek/alphalph.pdf
150700 344174 56.2% ./tetex-doc/latex/oberdiek/pagesel.pdf
148957 207393 28.2% ./tetex-doc/latex/oberdiek/hypcap.pdf
148888 331597 55.1% ./tetex-doc/latex/oberdiek/hypbmsec.pdf
25800 36449 29.3% ./tetex-doc/latex/amsmath/subeqn.pdf
47269 58756 19.6% ./tetex-doc/latex/amsmath/technote.pdf
231961 276719 16.2% ./tetex-doc/latex/amsmath/testmath.pdf
221220 520025 57.5% ./tetex-doc/latex/amsmath/amsldoc.pdf
235631 346068 31.9% ./tetex-doc/latex/hyperref/slides.pdf
107295 161506 33.6% ./tetex-doc/latex/hyperref/paper.pdf
258953 342982 24.5% ./tetex-doc/latex/hyperref/manual.pdf
205919 234432 12.2% ./tetex-doc/latex/pict2e/pict2e.pdf
1077900 1852923 41.8% ./tetex-doc/latex/memoir/memman.pdf
392068 422277 7.2% ./tetex-doc/latex/dk-bib/dk-bib.pdf
290012 337327 14.0% ./tetex-doc/latex/ntheorem/ntheorem.pdf
88618 118233 25.1% ./tetex-doc/latex/eulervm/eulervm.pdf
126391 172832 26.9% ./tetex-doc/tetex/eurotex98-te.pdf
164354 242186 32.2% ./tetex-doc/fontinst/talks/et99-font-tutorial.pdf
169423 201767 16.0% ./tetex-doc/fontinst/talks/et99-font-tables.pdf
150382 172067 12.6% ./tetex-doc/fontinst/manual/intro98.pdf
342214 381721 10.4% ./tetex-doc/fontinst/manual/fontinst.pdf
332317 478831 30.6% ./tetex-doc/eplain/eplain.pdf
702146 972471 27.8% ./tetex-doc/generic/pstricks/pstricks-add-doc.pdf
275785 338150 18.5% ./tetex-doc/generic/pstricks/pst-lens.pdf
115452 133800 13.7% ./tetex-doc/generic/pstricks/pst-gr3d.pdf
41890 57352 27.0% ./tetex-doc/generic/pstricks/pst-blur.pdf
91042 110916 17.9% ./tetex-doc/generic/pstricks/pst-slpe.pdf
517853 561200 7.7% ./tetex-doc/generic/pstricks/pst-osci.pdf
5096682 5187070 1.7% ./tetex-doc/generic/pstricks/vue3d-e.pdf
1020946 1164106 12.3% ./tetex-doc/generic/pstricks/pst-3dplot-doc.pdf
256937 307982 16.6% ./tetex-doc/generic/pstricks/pst-uml-doc.pdf
143851 162112 11.3% ./tetex-doc/generic/pstricks/pst-math.pdf
44177 55473 20.4% ./tetex-doc/generic/pstricks/tst-poly.pdf
115290 139994 17.7% ./tetex-doc/generic/pstricks/pst-poly.pdf
1192204 1381045 13.7% ./tetex-doc/generic/pstricks/pst-fill-doc.pdf
78925 92356 14.6% ./tetex-doc/generic/pstricks/psgomanual.pdf
80617 119595 32.6% ./tetex-doc/generic/pstricks/pstnews97-15.pdf
140966 175120 19.5% ./tetex-doc/generic/pstricks/pst-circ-doc.pdf
176058 205686 14.4% ./tetex-doc/generic/mfpic/mfpguide.pdf
278090 407825 31.8% ./tetex-doc/generic/mfpic/mfpman.pdf
695586 763788 8.9% ./tetex-doc/generic/xypic/xyrefer.pdf
199812 229223 12.8% ./tetex-doc/generic/xypic/xyguide.pdf
103151 130183 20.8% ./tetex-doc/generic/spanish/division.pdf
60593 105735 42.7% ./tetex-doc/generic/ukrhyph/rules90.pdf
59777 70816 15.6% ./tetex-doc/generic/ukrhyph/rules_ph.pdf
86522 106565 18.8% ./tetex-doc/generic/ukrhyph/rules60.pdf
151022 206964 27.0% ./tetex-doc/generic/tex-ps/cmyk-hax/cmyk-doc.pdf
32172 32811 2.0% ./tetex-doc/pdftex/examples/pic.pdf
396897 618807 35.9% ./tetex-doc/programs/web2c.pdf
381800 414758 8.0% ./tetex-doc/programs/dvips.pdf
364550 567905 35.8% ./tetex-doc/programs/kpathsea.pdf
116535 152462 23.6% ./tetex-doc/programs/dvipdfm.pdf
131266 167813 21.8% ./tetex-doc/programs/dvipng.pdf
1088510 1664881 34.6% ./tetex-doc/programs/texinfo.pdf
812409 985438 17.6% ./hlatex/guide/hlguide.pdf
236849 290311 18.4% ./debiandoc-sgml-doc/debiandoc-sgml.en.pdf
91005 103816 12.4% ./xen/pdf/interface.pdf
145649 164917 11.7% ./xen/pdf/user.pdf
323613 420926 23.1% ./dblatex/manual.pdf
168530 181283 7.1% ./fcitx/fcitx3.pdf
2346394 3590496 34.7% ./ddd-doc/ddd.pdf
145928 250564 41.8% ./ddd-doc/ddd-themes.pdf
790846 1166650 32.2% ./harden-doc/securing-debian-howto.de.pdf
730825 1093197 33.2% ./harden-doc/securing-debian-howto.en.pdf
758996 1109269 31.6% ./harden-doc/securing-debian-howto.fr.pdf
50136586 67439000 25.7% (totals)
Regards,
Osamu
Reply to: