Re: Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files

To: debian-tetex-maint@lists.debian.org
Subject: Re: Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
From: Frank Küster <frank@debian.org>
Date: Thu, 03 Nov 2005 12:50:01 +0100
Message-id: <[🔎] 87wtjqymqe.fsf@alhambra.kuesterei.ch>
In-reply-to: <[🔎] 20051102210120.GA24230@thinkpad> (Ralf Stubner's message of "Wed, 2 Nov 2005 22:01:20 +0100")
References: <[🔎] 8764rbvyh3.fsf@alhambra.kuesterei.ch> <[🔎] 20051102161016.GE2704@preusse> <[🔎] 87d5ljuhlj.fsf@alhambra.kuesterei.ch> <[🔎] 20051102181030.GF2704@preusse> <[🔎] 20051102210120.GA24230@thinkpad>

Ralf Stubner <ralf.stubner@web.de> wrote:

> Text-extraction from PDF is really complicated. If one adds a few
> interesting things (fi, ä, ß) to Frank's test file, one finds that
> pdftotext (best used via 'less <pdf-file>') that 'fi' is not found at
> all, 'ä' is found, 'ß' is found as 'ÿ', even when processed with
> pdflatex. IIRC there is some stage in the text-extraction where some
> default encoding (Latin-1 or something similar) is used. pdflatex
> probably includes the Type3 font with an encoding equivalent to T1. Now
> the code position of 'fi' in T1 is not defined in Latin-1, the code
> position of 'ß' in T1 is 'ÿ' in Latin-1, the code position of 'ä' is the
> same in both. So this fits. I guess that ghostscript changes the
> encoding of the Type3 font when creating the PDF, which makes text
> extraction rather meaningless. If one uses Type1 fonts, ghostscript is
> probably able to use a sensible encoding based on the glyphnames in the
> font. 

That sounds all very sensible, *but*:  On dctt where this first came up
(Thread started by "Nils"),  several people said that they could use the
find function on pdf files - I assume they read the question properly
and used latex/dvips/ps2pdf.

Regards, Frank
-- 
Frank Küster
Inst. f. Biochemie der Univ. Zürich
Debian Developer

Reply to:

Follow-Ups:
- Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
  - From: Ralf Stubner <ralf.stubner@physik.uni-erlangen.de>

References:
- Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
  - From: Frank Küster <frank@debian.org>
- Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
  - From: Hilmar Preusse <hille42@web.de>
- Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
  - From: Frank Küster <frank@kuesterei.ch>
- Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
  - From: Hilmar Preusse <hille42@web.de>
- Re: Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
  - From: Ralf Stubner <ralf.stubner@web.de>

Prev by Date: Bug#336807: [SourceForge.net] [ xdvi-Bugs-1344956 ] segfault with hyperref
Next by Date: Bug#334701: can't install tetex-bin 2.0.2-31 due to /usr/share/man/man1/texi2pdf.1.gz
Previous by thread: Re: Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
Next by thread: Bug#337084: tetex-base: latex/dvips/ps2pdf produces buggy pdf files
Index(es):
- Date
- Thread