Bug#424404: PDF files are much larger with lmodern than with CM fonts

To: "Thanh Han The" <hanthethanh@gmail.com>
Cc: "Martin Schröder" <martin@oneiros.de>, "Norbert Preining" <preining@logic.at>, "Reinhard Kotucha" <reinhard.kotucha@web.de>, "Hartmut Henkel" <hartmut_henkel@gmx.de>
Subject: Bug#424404: PDF files are much larger with lmodern than with CM fonts
From: Reinhard Kotucha <reinhard.kotucha@web.de>
Date: Thu, 17 May 2007 01:47:37 +0200
Message-id: <17995.38937.202585.346042@zarniwoop.ms25.local>
Reply-to: Reinhard Kotucha <reinhard.kotucha@web.de>, 424404@bugs.debian.org
In-reply-to: <74f506dc0705160427v15c62768x526a6d2d83e34ee8@mail.gmail.com>
References: <74f506dc0705160427v15c62768x526a6d2d83e34ee8@mail.gmail.com>

Hi,
>>>>> "Thanh" == Thanh Han The <hanthethanh@gmail.com> writes:

  > this is a feature/limitation/bug/<call-what-you-like> of pdftex:
  > it doesn't remove unused Subs entries (this is very tricky and
  > dangerous), but replaces them by a dummy one.  

let me give a few additional explanations.  Glyph descriptions  are
stored in the so-called CharString array.  CharStrings can contain
subroutine calls.  Subroutines are stored in the Subrs array.

Though a Type1 font is a PostScript program, it is also possible to
write a parser which supports only the little subset of PostScript
instructions allowed in Type1 fonts.  And this is what most font
renderers do.

There is actually no problem to omit unused subroutines if such an
engine is used.  PostScript supports sparse arrays.

However, some engines obviously do not even understand this minimal
subset of PostScript code.  They just create an array and put all the
subroutines into it.  Unfortunately they ignore the array index given
in the font.  That means that if Subrs[x] is removed because it is not
used by any CharString of the subset, Subrs[x] will contain the value
which is supposed to be in Subrs[x+1].  With other words: some engines
do not support sparse arrays.

What pdftex and dvips currently do is to replace unused subroutines by
subroutines which contain a return statement only.  This is quite
safe.  I don't think that it is very dangerous to remove unused
subroutines, but it is a bit inconvenient and makes font inclusion a
bit slower.  It is probably necessary to parse the CharStrings first,
create an array which maps original Subrs indices to new ones and then
write out the new Subrs array and a CharString array which uses the
new indices.

There is another important point: If the current subset requires
Subrs[0] ... Subrs[x], subroutines with indices > x will be removed. 

I do not know how much space the dummy subroutines require in a PDF
file.  Maybe I should create a font which has two identical glyphs,
where one glyph uses Subrs[4] and the other calls the identical
subroutine Subrs[1004].

Anyway, unless it is unclear how much is gained by changing writet1.c,
I think that the best way to experiment with such things is to put the
Subrs and CharString arrays into Lua tables.  Maybe this had already
been done in LuaTeX.

  > lmr10 has a very large number of subroutines (~550), while cmr10
  > has only 3.

I don't think that the large number of subroutines is the problem.  As
I said above, pdftex and dvips remove all subroutines which have a
larger index than the subroutine with the largest index which is
actually needed.

The lm fonts are horribly inefficient in this respect.  It would make
sense that characters which are used frequently (the ASCII alphabet,
for instance) use subroutines which are close to Subrs[0].

But in lm, the most common character in western languages, /e calls
Subrs[534], /a calls Subrs[561] and so on.

  > Another factor is that t1 fonts are encrypted and hence they
  > cannot be compressed effectively, so all these dummy entries
  > didn't get much smaller after compression and therefore the result
  > is a much larger size.

That's true but the only reasonable solution is to convert Type1 fonts
to CFF.  As you already said, that's not a weekend project.  I'm quite
optimistic, though.  If Taco doesn't want to depend on external
libraries when he provides OTF support, I assume that he has to delve
into it anyway.

On the other hand, in the future pdftex will support OTF.  In OTF,
Type1 fonts will be in CFF format.  The easiest way to decrease PDF
file size will be to convert Type1 fonts to OTF using an external
tool.  AFAIK fontforge is able to convert Type1 to OTF already.

Regards,
  Reinhard

-- 
----------------------------------------------------------------------------
Reinhard Kotucha			              Phone: +49-511-4592165
Marschnerstr. 25
D-30167 Hannover	                      mailto:reinhard.kotucha@web.de
----------------------------------------------------------------------------
Microsoft isn't the answer. Microsoft is the question, and the answer is NO.
----------------------------------------------------------------------------

Reply to:

Prev by Date: Re: building libkpathsea
Next by Date: Re: building libkpathsea
Previous by thread: Bug#424404: PDF files are much larger with lmodern than with CM fonts
Next by thread: Bug#424404: PDF files are much larger with lmodern than with CM fonts
Index(es):
- Date
- Thread