Re: How to extract text from PDF?

To: debian-user@lists.debian.org
Cc: Andrius <pantor@painter-decorator.eu>
Subject: Re: How to extract text from PDF?
From: Mike Bird <mgb@yosemite.net>
Date: Wed, 5 Mar 2008 15:28:38 -0800
Message-id: <[🔎] 200803051528.38778.mgb@yosemite.net>
In-reply-to: <[🔎] 47CF2AD9.6080406@painter-decorator.eu>
References: <[🔎] 47CF2AD9.6080406@painter-decorator.eu>

On Wed March 5 2008 15:20:57 Andrius wrote:
> technical question: is it possible to extract text from PDF? From PDF to
> txt.

If the PDF was built from text, then pdftotext will extract the text.
pdftotext is in the xpdf-utils package.  Be careful: if you don't
explicitly specify an output file pdftotext will create one, possibly
overwriting a file you'd rather not have overwritten.

If the PDF was built from an image - e.g. a scanned document - you'd
need some kind of OCR.

--Mike Bird

Reply to:

References:
- How to extract text from PDF?
  - From: Andrius <pantor@painter-decorator.eu>

Prev by Date: Control-C kills interactive bash
Next by Date: Re: Probably very stupid script/bash question
Previous by thread: Re: How to extract text from PDF?
Next by thread: Re: How to extract text from PDF?
Index(es):
- Date
- Thread