[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

PDF files and dh_compress



I'm sorry if this question was discussed before but I couldn't google it
up and think that it is too early to raise on -dev.

I've got finally annoyed enough by compressed pdf.gz in -doc packages
that I decided to check if that is required (deb pol, or dev ref?)
and/org common practice.

Let me first reveal some numbers characterizing current situation:

Total number of pdf files present in sid:
> apt-file search .pdf | grep '\.pdf\(\.gz\)*$' >| pdf.files
> wc  -l pdf.files
2485 pdf.files

How many pdfs lie outside of doc (just out of curiosity):
> grep -v 'usr/share/doc' pdf.files   | wc -l
476

And whatever is within share/doc,
gzipped:
> grep 'usr/share/doc/.*\.pdf\.gz$' pdf.files | wc -l
1095
raw pdf:
> grep 'usr/share/doc/.*\.pdf$' pdf.files | wc -l
914

So approx 50/50, so half people do adjust debian/make to exclude .pdfs.
I'm lazy to spot some dependency here -- may be cdbs takes care about
keeping them not compressed automatically?

And if we look only at -doc packages which are intended to provide a
documentation (ie ready to be readable information, not another gzipped
single file ball needed to be decompressed before viewing)
> grep 'usr/share/doc/[^/]*-doc/.*\.pdf\.gz$' pdf.files | wc -l
253
> grep 'usr/share/doc/[^/]*-doc/.*\.pdf$' pdf.files | wc -l
573
the situation is slightly better: > 2/3 are keeping PDFs uncompressed in
-doc packages.

This simple algebra shows though that there is no agreement/clear policy
(or at least it is not followed) on how PDFs should be handled. Of cause
pdfs are not as highly compressed as with gzip -9 but they are
zipped internally and usually are less than 10% larger than their
.pdf.gz versions. And at least I would expect all -doc packages to have
uncompressed .pdfs since neither of the pdf viewers to me experience
handle transparent decompression of pdf.gz

Few questions now:

So is there a recommendation anywhere in dev ref or deb policy regarding
the PDF files? 

Shouldn't it be recommended (withing dev ref or deb policy) to keep PDFs
not compressed with gzip on top (at least in -doc packages)?

Obviousely dh_compress doesn't bother checking if there is a good reason
to compress the file (like some threshold gain, after which file has to
be compressed). I doubt that it is worth implementing, but I think it
should at least take care about not compressing pdf's in -doc packages.
What do you think?

As always, depending on the answers to previous questions, may be it
is worth to provide linda/lintian warnings about twice compressed
files or at least compressed pdfs in -doc packages.

Thank you in advance
-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]


Attachment: pgpK_jfGkCuZ0.pgp
Description: PGP signature


Reply to: