[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Convert a pdf to text



On 22/09/11 15:31, Sharon Kimble wrote:
> I have a 96 page pdf file that I need to convert to text in one run.
> I've imported it into inkscape but that only converts one page at a
> time. I've tried using pdftotext but i cant work out the syntax for
> that so am unable to test it out properly. I've tried pdfedit but that
> only works on one page at a time and doesnt convert it to text.
> 
> Can anyone help me out with suggestions for converting the pdf in one
> go to text please?
> 
> Many thanks
> Sharon.

Do you mean a multi-page or many pages?

Converting all of a multi-page pdf is just:-
$ pdftotext multipage_example.pdf

which will produce a single text file called multipage_example.txt
containing all the text from the pdf.

If you want to preserve the format try pdftohtml
If some (or all) of the content is images of text try tesseract - though
you'll have to do a little preparation.

Ocular will also export a pdf to text (providing all the text in the pdf
is actual text, not images)

Cheers

-- 
"People say to me, "Bill, quit bringing up Kennedy, man. Let it go. It
was a long time ago. Just forget about it."
All right, then don't bring up Jesus to me. I mean, as long as we're
talking shelf-life here.
"You know, Bill, Jesus died for you …" Yeah, it was a long time ago.
Forget about it.
How about this: get Pilate to release the [beep]in' files. Quit washing
your hands, Pilate, and release the files. Who else was on that grassy
Golgotha that day? Oh yeah, the three Roman peasants in $100 sandals"
— Bill Hicks


Reply to: