Re: PDF files and dh_compress
Martin Wuertele <maxx@debian.org> wrote:
> * Eduard Bloch <edi@gmx.de> [2006-05-09 20:05]:
>
>> #include <hallo.h>
>> * Yaroslav Halchenko [Tue, May 09 2006, 01:15:54PM]:
>> > Dear Developers,
>> >
>> > I've raised this discussion at -mentors first [1] but I think it is worth
>> > asking on a devel list since no definite decision was reached and I
>> > could not find similar discussion in the archives.
>>
>> I am strongly against compressing PDFs if the compression ratio is
>> miserable, which is IMO the case for most PDFs nowadays. Take our policy
>> document for example - compression saves less than 30 percent and throws
>> unneeded stowns in the way of potential readers.
>
> How about compressing all generated pdf with eg pdftk instead of gzip?
> That would save on space without troubling potential readers.
Most PDF files in Debian are already compressed; at least those
which are generated on a Debian system, and somehow TeX is involved
are.
frank@riesling:~/area$ pdftk policy.pdf output policy.unc.pdf uncompress
frank@riesling:~/area$ pdftk policy.unc.pdf output policy.recomp.pdf compress
frank@riesling:~/area$ lh policy.*
-rw-r--r-- 1 frank frank 657K 2006-05-09 21:18 policy.pdf
-rw-r--r-- 1 frank frank 667K 2006-05-09 21:19 policy.recomp.pdf
-rw-r--r-- 1 frank frank 1.4M 2006-05-09 21:19 policy.unc.pdf
frank@riesling:~/area$ lh /usr/share/doc/debian-policy/policy.pdf.gz
-rw-r--r-- 1 root root 471K 2006-04-26 07:27 /usr/share/doc/debian-policy/policy.pdf.gz
So somehow uncompressing-recompressing with pdftk gives a slightly
larger file, whereas gzipping saves 28%. Or in other words, the
original PDF ZIP compression saves 53% compared to the uncompressed PDF
file, additional gzipping saves 67% all in all. And bzip2 on the
uncompressed PDF file yields
-rw-r--r-- 1 frank frank 332K 2006-05-09 21:19 policy.unc.pdf.bz2
or 76% space saving.
To me, this sounds like in general it's not worth to compress compressed
pdf files with gzip; if we'd go for size, we should use bzip2.
However, we have to keep in mind that some -doc packages consist mainly
of PDF files and would significantly increase in size (or need some
additional splitting).
Regards, Frank
--
Frank Küster
Single Molecule Spectroscopy, Protein Folding @ Inst. f. Biochemie, Univ. Zürich
Debian Developer (teTeX)
Reply to: