[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: PDF files and dh_compress



Martin Wuertele <maxx@debian.org> wrote:

> * Eduard Bloch <edi@gmx.de> [2006-05-09 20:05]:
>
>> #include <hallo.h>
>> * Yaroslav Halchenko [Tue, May 09 2006, 01:15:54PM]:
>> > Dear Developers,
>> > 
>> > I've raised this discussion at -mentors first [1] but I think it is worth
>> > asking on a devel list since no definite decision was reached and I
>> > could not find similar discussion in the archives.
>> 
>> I am strongly against compressing PDFs if the compression ratio is
>> miserable, which is IMO the case for most PDFs nowadays. Take our policy
>> document for example - compression saves less than 30 percent and throws
>> unneeded stowns in the way of potential readers.
>  
> How about compressing all generated pdf with eg pdftk instead of gzip?
> That would save on space without troubling potential readers.

Most PDF files in Debian are already compressed;  at least those
which are generated on a Debian system, and somehow TeX is involved
are.  

frank@riesling:~/area$ pdftk policy.pdf output policy.unc.pdf uncompress
frank@riesling:~/area$ pdftk policy.unc.pdf output policy.recomp.pdf compress
frank@riesling:~/area$ lh policy.*
-rw-r--r--  1 frank frank 657K 2006-05-09 21:18 policy.pdf
-rw-r--r--  1 frank frank 667K 2006-05-09 21:19 policy.recomp.pdf
-rw-r--r--  1 frank frank 1.4M 2006-05-09 21:19 policy.unc.pdf
frank@riesling:~/area$ lh /usr/share/doc/debian-policy/policy.pdf.gz  
-rw-r--r--  1 root root 471K 2006-04-26 07:27 /usr/share/doc/debian-policy/policy.pdf.gz

So somehow uncompressing-recompressing with pdftk gives a slightly
larger file, whereas gzipping saves 28%.  Or in other words, the
original PDF ZIP compression saves 53% compared to the uncompressed PDF
file, additional gzipping saves 67% all in all.  And bzip2 on the
uncompressed PDF file yields

-rw-r--r--  1 frank frank 332K 2006-05-09 21:19 policy.unc.pdf.bz2

or 76% space saving.


To me, this sounds like in general it's not worth to compress compressed
pdf files with gzip; if we'd go for size, we should use bzip2.


However, we have to keep in mind that some -doc packages consist mainly
of PDF files and would significantly increase in size (or need some
additional splitting).

Regards, Frank

-- 
Frank Küster
Single Molecule Spectroscopy, Protein Folding @ Inst. f. Biochemie, Univ. Zürich
Debian Developer (teTeX)



Reply to: