[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to extract text from PDF?



On Wed March 5 2008 15:20:57 Andrius wrote:
> technical question: is it possible to extract text from PDF? From PDF to
> txt.

If the PDF was built from text, then pdftotext will extract the text.
pdftotext is in the xpdf-utils package.  Be careful: if you don't
explicitly specify an output file pdftotext will create one, possibly
overwriting a file you'd rather not have overwritten.

If the PDF was built from an image - e.g. a scanned document - you'd
need some kind of OCR.

--Mike Bird


Reply to: