[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: PDF files and dh_compress



> On 5/9/06, Yaroslav Halchenko <debian@onerussian.com> wrote:
> >* dh_compress doesn't compress some other files based on extension
> > including .zip files. PDF (to my knowledge) uses zip internally to
> > compress the document. So why PDF should be gzipped if .zip not?
> Probably because .zip is compressed better already than .pdf?
Indeed they are doing better:

itanix:/usr# find /usr  -iname *.zip | while read fname; do nfname=${fname//\//_};  gzip -9 -c "$fname" >| "/var/tmp/zips/${nfname}.gz"; done
itanix:/usr# du -sh /var/tmp/zips
51M     /var/tmp/zips

itanix:/usr# find /usr  -iname *.zip | xargs du -ch
54M     total

> >* Although there is  a way to view pdf.gz without explicit decompression
> > (use see or xzpdf) it is inconvenient for being used from firefox for
> > instance (?)
> What about improving transparent decompression somehow?
well... unification via mailcap is nice, but as I mentioned before it
would be trickier to make it work from firefox (it assumes all .gz files
to be gzip archives right of the box), and would require each
application tuning to make it handle pdf.gz, or may be pdf.bz2...

besides that such unification would forbid my own choice... now
for quick viewing I prefer to use xpdf, for some long read I would like
to use acroread... 

Sure we could create some wrapper to make it work like
zview acroread *.pdf.gz

but I am not sure if it is really worth it.

As I have shown, uncompressed PDFs of the whole sid would sacrifice just
150M of space which is miniscular... and on old boxes
(routers/gateways) you wouldn't want to install all -doc packages in any
case (which should contain most of pdf files)

Additional tuning of pdfs with pdftk might help... quick test on a
locally installed box (so just with a subset of pdfs):

83M     pdfs.orig      -- original pdf and pdf.gz found on the system
106M    pdfs           -- gunzipped pdf.gz and original pdfs
474M    pdfs.unc       -- uncompressed with pdftk
102M    pdfs.compress  -- compressed back with pdftk
101M    pdfs.best      -- the best between pdfs and pdfs.compress

if we gzip -9 original .pdf 
76M     pdfs.gz

So there is sense in invoking pdftk just to compress possibly terrible
(with uncompressed streams) PDF files

-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]


Attachment: pgpkHHXk4B8ie.pgp
Description: PGP signature


Reply to: